What is the throughput advantage of Kimi K2 Turbo over other K2 variants?

It drops the extended thinking overhead in K2 Thinking and K2 Thinking Turbo, so generation goes to token output. Check this page for live throughput and latency figures.

Does Kimi K2 Turbo support tool calling without thinking mode?

Yes. Multi-step tool calling, parallel function execution, and long-horizon tool-use sequences all run in turbo mode without thinking overhead.

When was Kimi K2 Turbo launched?

Moonshot AI launched Kimi K2 Turbo on September 5, 2025. Moonshot AI publishes release notes on https://www.moonshot.ai. See https://moonshotai.github.io/Kimi-K2/ for AI Gateway pricing, routing, and limits.

How does Kimi K2 Turbo differ from Kimi K2 Thinking Turbo?

Both operate at turbo latency, but Kimi K2 Turbo has no thinking mode. Responses generate directly without deliberation. K2 Thinking Turbo keeps a compressed thinking budget and emits chain-of-thought reasoning at turbo speed. Use Turbo when reasoning isn't needed. Use Thinking Turbo when deliberation improves quality.

What is the context window for Kimi K2 Turbo?

256K tokens, consistent with the K2 family.

How do I start using Kimi K2 Turbo?

Use the identifier `moonshotai/kimi-k2-turbo` with any supported interface. AI Gateway manages provider routing automatically.

Kimi K2 Turbo

View Status

Kimi K2 Turbo is Moonshot AI's throughput-oriented K2 variant. It runs the K2 Mixture-of-Experts (MoE) architecture without thinking overhead, built for streaming interfaces, high-volume pipelines, and agentic workflows where first-token latency drives responsiveness.

Tool Use

import { streamText } from 'ai'

const result = streamText({
  model: 'moonshotai/kimi-k2-turbo',
  prompt: 'Why is the sky blue?'
})

Playground

Try out Kimi K2 Turbo by Moonshot AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Kimi K2 Turbo

Kimi K2 Turbo is the throughput-maximized configuration of Kimi K2, launched on September 5, 2025. It runs the K2 Mixture-of-Experts (MoE) architecture (1T total parameters, 32B active per forward pass) without the extended thinking layer. All generation capacity goes toward token throughput. When a reasoning model's internal monologue adds latency without adding value, Kimi K2 Turbo removes that overhead entirely.

The K2 MoE architecture keeps agentic capabilities in turbo mode: multi-step tool calling, long-horizon task management, and parallel function execution all operate at the turbo speed profile. The model handles sequences of tool invocations (query an API, process the result, call another API, synthesize a response) without triggering thinking mode. For agentic pipelines where many such sequences run in parallel or in tight loops, the per-step latency reduction compounds into wall-clock savings.

Streaming interfaces benefit from lower first-token latency. A chat interface that starts streaming tokens sooner cuts wait time before visible output. One that waits for a thinking model to finish deliberation before streaming does not. Kimi K2 Turbo targets cases where streaming latency defines the product experience.

Kimi K2 Turbo is available through AI Gateway at $1.15 per million input tokens and $8 per million output tokens. Release history from Moonshot AI may also appear on https://www.moonshot.ai.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

256K

0.8s

$1.15/M

$8.00/M

Read:$0.15/M

Write:—

—

09/05/2025

More models by Moonshot AI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

262K

0.6s

84tps

$0.95/M

$4.00/M

Read:$0.16/M

Write:—

—

04/20/2026

262K

0.3s

94tps

$0.50/M

$2.80/M

Read:$0.1/M

Write:—

—

01/26/2026

262K

1.1s

26tps

$0.60/M

$2.50/M

Read:$0.15/M

Write:—

—

11/06/2025

262K

0.7s

120tps

$1.15/M

$8.00/M

Read:$0.15/M

Write:—

—

11/06/2025

131K

1.7s

38tps

$0.57/M

$2.30/M

—

09/05/2025

256K

$0.60/M

$2.50/M

Read:$0.3/M

Write:—

—

09/05/2025

What To Consider When Choosing a Provider

Configuration: For agentic pipelines that need the lowest first-token latency, verify provider response time benchmarks for your deployment region.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Kimi K2 Turbo

Best For

Real-time streaming chat: First-token latency drives perceived responsiveness in chat interfaces. No thinking overhead means the first token arrives sooner, which users notice immediately
High-frequency tool-calling agents: Agents that execute many sequential or parallel tool calls benefit from the per-call latency reduction. A 100-step agentic workflow is faster at turbo latency than thinking latency
Sub-agents in multi-agent orchestration: When K2 Turbo serves as a worker node in a larger agentic system, its response time affects overall orchestration throughput. Fast sub-agents keep the pipeline moving
Cost-sensitive high-volume production: Lower latency often correlates with lower compute cost at scale. Kimi K2 Turbo delivers K2-level capability at a throughput-oriented configuration

Consider Alternatives When

The task requires explicit reasoning steps: When chain-of-thought deliberation improves output quality, Kimi K2 Thinking or K2 Thinking Turbo is more appropriate
Complex multi-step planning is central to the workflow: Tasks where the model needs to plan before acting benefit from the thinking variants' deliberation budget
You're building a reasoning benchmark or evaluation: Reasoning benchmarks that reward explicit deliberation will show different scores from thinking-enabled variants

Conclusion

Kimi K2 Turbo is the right K2 configuration when speed is the binding constraint and chain-of-thought is overhead rather than a benefit. For streaming interfaces, high-frequency agents, and latency-sensitive pipelines, it delivers K2-generation capability at high throughput.

Frequently Asked Questions

What is the throughput advantage of Kimi K2 Turbo over other K2 variants?
It drops the extended thinking overhead in K2 Thinking and K2 Thinking Turbo, so generation goes to token output. Check this page for live throughput and latency figures.
Does Kimi K2 Turbo support tool calling without thinking mode?
Yes. Multi-step tool calling, parallel function execution, and long-horizon tool-use sequences all run in turbo mode without thinking overhead.
When was Kimi K2 Turbo launched?
Moonshot AI launched Kimi K2 Turbo on September 5, 2025. Moonshot AI publishes release notes on https://www.moonshot.ai. See https://moonshotai.github.io/Kimi-K2/ for AI Gateway pricing, routing, and limits.
How does Kimi K2 Turbo differ from Kimi K2 Thinking Turbo?
Both operate at turbo latency, but Kimi K2 Turbo has no thinking mode. Responses generate directly without deliberation. K2 Thinking Turbo keeps a compressed thinking budget and emits chain-of-thought reasoning at turbo speed. Use Turbo when reasoning isn't needed. Use Thinking Turbo when deliberation improves quality.
What is the context window for Kimi K2 Turbo?
256K tokens, consistent with the K2 family.
How do I start using Kimi K2 Turbo?
Use the identifier moonshotai/kimi-k2-turbo with any supported interface. AI Gateway manages provider routing automatically.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Kimi K2 Turbo

Playground

About Kimi K2 Turbo

Providers

More models by Moonshot AI

What To Consider When Choosing a Provider

When to Use Kimi K2 Turbo

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions

Playground

About Kimi K2 Turbo

Providers

More models by Moonshot AI