What's the difference between Trinity Large Thinking and Trinity Large Preview?

Thinking emits extended chain-of-thought reasoning; Preview does not emphasize trace output. Thinking runs as a 398B sparse MoE with about 13B active parameters per token. Preview is a 400B-parameter (13B active) MoE aimed at long-context reasoning workloads. Choose Thinking when you need explicit reasoning traces; choose Preview when you do not.

Does chain-of-thought reasoning affect token usage?

Yes. Intermediate steps count in the output, so expect higher token use than a short answer from the base preview model. Factor that into cost and latency planning.

When should I use this instead of Trinity Mini?

When you need large-stack reasoning traces more than Mini's cost profile. Trinity Mini uses 26B total parameters with 3B active and fits high-volume, budget-sensitive inference. Trinity Large Thinking fits heavier reasoning and audit-style review, not minimal token use.

Do I need an Arcee AI account to access Trinity Large Thinking on AI Gateway?

No. Use your AI Gateway API key or an OIDC token. You don't need a separate provider account.

Can I use Trinity Large Thinking with the AI SDK?

Yes. Set `model` to `arcee-ai/trinity-large-thinking` in the AI SDK's `streamText` or `generateText` call. AI Gateway also exposes OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and OpenResponses-compatible interfaces.

Is the reasoning trace useful for compliance or audit purposes?

It can help. The model can surface intermediate steps you log next to the final answer. You still own retention, access control, and policy for those logs.

Does AI Gateway provide observability for Trinity Large Thinking requests?

Yes. Token usage, latency, and cost show in your AI Gateway dashboard for each request without extra instrumentation.

Trinity Large Thinking

View Status

Trinity Large Thinking is a reasoning-focused variant in Arcee AI's Trinity Large family: a 398B-parameter sparse mixture-of-experts model with about 13B active parameters per token, built on Trinity Large Base and emphasizing extended chain-of-thought reasoning.

ReasoningTool UseImplicit Caching

import { streamText } from 'ai'

const result = streamText({
  model: 'arcee-ai/trinity-large-thinking',
  prompt: 'Why is the sky blue?'
})

Playground

Try out Trinity Large Thinking by Arcee AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Trinity Large Thinking

Trinity Large Thinking is a 398B-parameter sparse mixture-of-experts model with about 13B active parameters per token. It is built on Trinity Large Base and emits intermediate reasoning in the output when the task calls for it.

Use it when you need audit-friendly, stepwise reasoning more than the shortest possible reply. Choose Trinity Large Preview when you do not need trace-heavy output.

Check https://docs.arcee.ai/language-models/trinity-large-thinking for the latest capabilities, limits, and rates. Reasoning depth trades off against speed and token count, so validate latency and cost on your prompts before you commit to architecture.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

262K

1.5s

153tps

$0.25/M

$0.90/M

—

04/01/2026

More models by Arcee AI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

131K

0.2s

278tps

$0.04/M

$0.15/M

—

12/01/2025

131K

0.6s

130tps

$0.25/M

$1.00/M

—

01/01/2025

What To Consider When Choosing a Provider

Configuration: Reasoning traces add output tokens. Budget for longer completions, stream responses, and compare $0.25 per million input tokens and $0.9 per million output tokens to your cost model.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Trinity Large Thinking

Best For

Auditable enterprise workflows: Step-by-step reasoning you can log and review
Analytical error reduction: Tasks where visible intermediate steps lower error rates
Traceable code review: Debugging or refactoring where the model's steps run alongside your own review
Inspectable decision flows: Multi-step pipelines where each stage must be reviewable

Consider Alternatives When

Short single-turn replies: Trinity Large Preview answers faster with fewer tokens
Minimal output budget: Reasoning traces add tokens that may not fit your cost model
Cost-dominant workloads: Trinity Mini meets a lower price point when its quality bar is enough

Conclusion

Trinity Large Thinking adds trace-oriented, post-trained reasoning on top of Arcee AI's Trinity Large Base stack in AI Gateway. Choose it when auditable steps matter; choose Trinity Large Preview when you do not need that overhead.

Frequently Asked Questions

What's the difference between Trinity Large Thinking and Trinity Large Preview?
Thinking emits extended chain-of-thought reasoning; Preview does not emphasize trace output. Thinking runs as a 398B sparse MoE with about 13B active parameters per token. Preview is a 400B-parameter (13B active) MoE aimed at long-context reasoning workloads. Choose Thinking when you need explicit reasoning traces; choose Preview when you do not.
Does chain-of-thought reasoning affect token usage?
Yes. Intermediate steps count in the output, so expect higher token use than a short answer from the base preview model. Factor that into cost and latency planning.
When should I use this instead of Trinity Mini?
When you need large-stack reasoning traces more than Mini's cost profile. Trinity Mini uses 26B total parameters with 3B active and fits high-volume, budget-sensitive inference. Trinity Large Thinking fits heavier reasoning and audit-style review, not minimal token use.
Do I need an Arcee AI account to access Trinity Large Thinking on AI Gateway?
No. Use your AI Gateway API key or an OIDC token. You don't need a separate provider account.
Can I use Trinity Large Thinking with the AI SDK?
Yes. Set model to arcee-ai/trinity-large-thinking in the AI SDK's streamText or generateText call. AI Gateway also exposes OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and OpenResponses-compatible interfaces.
Is the reasoning trace useful for compliance or audit purposes?
It can help. The model can surface intermediate steps you log next to the final answer. You still own retention, access control, and policy for those logs.
Does AI Gateway provide observability for Trinity Large Thinking requests?
Yes. Token usage, latency, and cost show in your AI Gateway dashboard for each request without extra instrumentation.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Trinity Large Thinking

Playground

About Trinity Large Thinking

Providers

More models by Arcee AI

What To Consider When Choosing a Provider

When to Use Trinity Large Thinking

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions

Playground

About Trinity Large Thinking

Providers

More models by Arcee AI