Gemma 4 31B IT is Google's open-weight dense model with 31B parameters, all active during inference. Built on the Gemini 3 architecture, it targets higher output quality than its MoE sibling, with support for function-calling, structured JSON output, native vision, and 140+ languages.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemma-4-31b-it', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
- Configuration: As a dense model with all parameters active, Gemma 4 31B IT uses more compute per token than the MoE Gemma 4 26B variant. Factor in the higher per-request cost and latency when evaluating provider variants for production traffic.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Gemma 4 31B IT
Best For
- Quality-critical generation tasks: You need the strongest output in the Gemma 4 family and can accept higher latency
- Complex reasoning and analysis: Multi-step planning, code generation, and detailed document analysis
- Multilingual applications: Serving users across 140+ languages with a single model
- Vision-language tasks: Image understanding, visual Q&A, and document parsing within a context window of 262.1K tokens
Consider Alternatives When
- Latency and throughput primary: Your primary constraints favor the MoE Gemma 4 26B, which activates fewer parameters and responds faster
- Native image or audio generation: You need media output, which Gemma 4 31B IT does not support
- High-volume low-complexity inference: A smaller or lighter model is more cost-effective
- Proprietary-grade benchmark performance: Gemini 3 Pro may be a better fit for the most demanding benchmarks
Conclusion
Gemma 4 31B IT is the quality-focused option in the Gemma 4 family. With all 31B parameters active during inference, it delivers stronger output on complex tasks. For teams that want open-weight flexibility with the highest reasoning quality the Gemma 4 generation offers, it is the right starting point on AI Gateway.
Frequently Asked Questions
What makes Gemma 4 31B IT different from the MoE Gemma 4 26B?
Gemma 4 31B IT is a dense model, meaning all 31B parameters are active during every forward pass. The MoE Gemma 4 26B activates roughly 4B of its 26B total parameters per pass. Gemma 4 31B IT targets higher output quality; the 26B variant targets lower latency and cost.
What input modalities does Gemma 4 31B IT support?
Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens. It does not generate images or audio.
How does Gemma 4 31B IT relate to Google's Gemini models?
Gemma 4 31B IT is built on the same architecture as Gemini 3 but with open weights. It shares capabilities like function-calling, structured output, and system instructions. Gemini models remain proprietary; Gemma 4 31B IT lets you inspect or adapt the weights.
What languages does Gemma 4 31B IT support?
Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.
How do I use Gemma 4 31B IT on AI Gateway?
Set the model to
google/gemma-4-31b-itin the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.Does Gemma 4 31B IT support function-calling?
Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, inherited from the Gemini 3 architecture.