Skip to content

Gemma 4 31B IT

View Status

Gemma 4 31B IT is Google's open-weight dense model with 31B parameters, all active during inference. Built on the Gemini 3 architecture, it targets higher output quality than its MoE sibling, with support for function-calling, structured JSON output, native vision, and 140+ languages.

Tool UseVision (Image)File Input
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemma-4-31b-it',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Configuration: As a dense model with all parameters active, Gemma 4 31B IT uses more compute per token than the MoE Gemma 4 26B variant. Factor in the higher per-request cost and latency when evaluating provider variants for production traffic.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemma 4 31B IT

Best For

  • Quality-critical generation tasks: You need the strongest output in the Gemma 4 family and can accept higher latency
  • Complex reasoning and analysis: Multi-step planning, code generation, and detailed document analysis
  • Multilingual applications: Serving users across 140+ languages with a single model
  • Vision-language tasks: Image understanding, visual Q&A, and document parsing within a context window of 262.1K tokens

Consider Alternatives When

  • Latency and throughput primary: Your primary constraints favor the MoE Gemma 4 26B, which activates fewer parameters and responds faster
  • Native image or audio generation: You need media output, which Gemma 4 31B IT does not support
  • High-volume low-complexity inference: A smaller or lighter model is more cost-effective
  • Proprietary-grade benchmark performance: Gemini 3 Pro may be a better fit for the most demanding benchmarks

Conclusion

Gemma 4 31B IT is the quality-focused option in the Gemma 4 family. With all 31B parameters active during inference, it delivers stronger output on complex tasks. For teams that want open-weight flexibility with the highest reasoning quality the Gemma 4 generation offers, it is the right starting point on AI Gateway.

Frequently Asked Questions

  • What makes Gemma 4 31B IT different from the MoE Gemma 4 26B?

    Gemma 4 31B IT is a dense model, meaning all 31B parameters are active during every forward pass. The MoE Gemma 4 26B activates roughly 4B of its 26B total parameters per pass. Gemma 4 31B IT targets higher output quality; the 26B variant targets lower latency and cost.

  • What input modalities does Gemma 4 31B IT support?

    Gemma 4 31B IT accepts text and image inputs within a context window of 262.1K tokens. It does not generate images or audio.

  • How does Gemma 4 31B IT relate to Google's Gemini models?

    Gemma 4 31B IT is built on the same architecture as Gemini 3 but with open weights. It shares capabilities like function-calling, structured output, and system instructions. Gemini models remain proprietary; Gemma 4 31B IT lets you inspect or adapt the weights.

  • What languages does Gemma 4 31B IT support?

    Over 140 languages. The instruction-tuning covers multilingual conversational and task-oriented use cases.

  • How do I use Gemma 4 31B IT on AI Gateway?

    Set the model to google/gemma-4-31b-it in the AI SDK. AI Gateway handles provider routing, retries, and failover automatically.

  • Does Gemma 4 31B IT support function-calling?

    Yes. It supports function-calling for agentic workflows, structured JSON output, and system instructions natively, inherited from the Gemini 3 architecture.