Arcee AI Ships 400B Open Model Rivaling Claude at 96% Less

San Francisco startup Arcee AI this week released Trinity-Large-Thinking, a 400-billion-parameter open-source reasoning model licensed under Apache 2.0, VentureBeat reported. The model activates only 13 billion of its parameters per token through a sparse Mixture-of-Experts architecture, running roughly two to three times faster than comparably sized dense models on identical hardware. At $0.90 per million output tokens, Trinity costs approximately 96% less than Anthropic's Claude Opus 4.6 at $25 per million, while scoring within two points of it on key agent benchmarks.

Key Takeaways

Arcee AI released Trinity-Large-Thinking, a 400B-parameter open-source reasoning model under Apache 2.0 at $0.90 per million output tokens.
The model activates only 13B of 400B parameters per token, running 2-3x faster than comparably dense models on the same hardware.
Trinity scored 91.9 on PinchBench, within two points of Claude Opus 4.6's 93.3, at approximately 96% lower cost.
Arcee trained the model for $20 million over 33 days with a 30-person team using 2,048 NVIDIA B300 Blackwell GPUs.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

A $20 million bet from a 30-person team

Arcee committed $20 million, nearly half the roughly $50 million in total capital the company has raised, to a single 33-day training run on 2,048 NVIDIA B300 Blackwell GPUs. Thirty people built this. The base model trained on 17 trillion tokens curated in partnership with DatologyAI, with over 8 trillion generated synthetically, not by imitating a larger model but by condensing raw web text into denser, reasoning-oriented formats.

"Developers and enterprises need models they can inspect, post-train, host, distill, and own," CTO Lucas Atkins said in the launch announcement.

That statement carries weight right now. Chinese labs that once dominated open-weight AI are pulling back. Key technical leads departed from Alibaba's Qwen lab. Z.ai shifted toward proprietary enterprise platforms. In the U.S., the push for American open-source AI lost its biggest champion when Meta retreated from the frontier after Llama 4's troubled reception in April 2025. The vacuum at the top of the open-weight market is real. Arcee is trying to fill it.

Benchmarks that earn a second look

On PinchBench, a metric for autonomous agent tasks, Trinity scored 91.9. Claude Opus 4.6 sits at 93.3. On IFBench, the gap narrows further: 52.3 versus 53.1. Trinity also recorded 96.3 on AIME25, matching Kimi-K2.5 and outperforming GLM-5's 93.3.

Coding remains a gap. SWE-bench Verified: 63.2 against Opus 4.6's 75.6.

But the cost difference recalibrates that equation. If you're running agents at scale, cost-per-correct-answer matters more than raw accuracy on a single benchmark.

Trinity-Large-Preview already established itself as the most-used open model in the U.S. on OpenRouter, serving over 80.6 billion tokens on peak days and accumulating more than 3.4 trillion tokens since its January launch.

How the sparsity works

Each token touches only 4 of 256 total experts. That's 1.56% of the model's parameters per forward pass. Extreme sparsity at this scale creates training instability. Arcee built SMEBU, Soft-clamped Momentum Expert Bias Updates, to prevent a few experts from becoming winners while others sat idle.

Get Implicator.ai in your inbox

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Attention alternates between local and global sliding window layers in a 3:1 ratio for long-context handling. OpenRouter lists a 262,144-token context window. Arcee says the model natively supports 512,000 tokens.

DigitalOcean announced Trinity is already available on its Agentic Inference Cloud in public preview, giving developers a managed path to run agents alongside Kubernetes clusters and databases.

The thinking upgrade that fixes the preview's weakness

Trinity-Large-Thinking represents a pivot from the January "Preview" instruct model. Early users found Preview could be "underwhelming" for agentic tasks, struggling with multi-step instructions in complex environments. The thinking update implements a reasoning phase before generating responses. Plan the task, verify the logic, then answer.

Arcee also released Trinity-Large-TrueBase, a raw 10-trillion-token checkpoint untouched by instruction tuning or reinforcement learning. For regulated industries like finance and defense, TrueBase offers a starting point for custom alignment without inheriting the biases of a general-purpose chat model.

What comes next

Hugging Face CEO Clément Delangue endorsed the release in a direct message to VentureBeat: "The strength of the US has always been its startups so maybe they're the ones we should count on to lead in open-source AI."

Arcee plans to distill Trinity Large's reasoning into its smaller Mini and Nano models. No new base model training run yet. The company will scale up reinforcement learning on the existing checkpoint while refreshing the compact line with frontier-level knowledge.

For enterprises building sovereign AI infrastructure, the math is clean. A U.S.-made, Apache-licensed reasoning model at $0.90 per million tokens, scoring within two points of the best proprietary option on agent benchmarks. Open and closed just got closer.

Frequently Asked Questions

What is Arcee Trinity-Large-Thinking?

Trinity-Large-Thinking is a 400-billion-parameter sparse Mixture-of-Experts reasoning model from Arcee AI. It activates only 13 billion parameters per token through a 4-of-256 expert routing strategy, making it two to three times faster than dense models of similar size. Released under the Apache 2.0 license, it targets long-horizon agent tasks and multi-turn tool calling.

How does Trinity compare to Claude Opus 4.6 on benchmarks?

Trinity scored 91.9 on PinchBench versus Claude Opus 4.6's 93.3, and 52.3 on IFBench compared to 53.1. On AIME25, Trinity reached 96.3. The main gap is in coding: 63.2 on SWE-bench Verified against Opus 4.6's 75.6. At $0.90 per million output tokens versus $25, Trinity costs approximately 96% less.

How much did it cost Arcee AI to train Trinity?

Arcee spent $20 million on a 33-day training run using 2,048 NVIDIA B300 Blackwell GPUs. The company has raised roughly $50 million in total capital and employs 30 people. The base model trained on 17 trillion tokens, with over 8 trillion generated as synthetic data in partnership with DatologyAI.

What is Trinity-Large-TrueBase?

TrueBase is a raw 10-trillion-token checkpoint from the Trinity Large training run, released without instruction tuning or reinforcement learning. It lets researchers and enterprises in regulated industries perform custom alignment and audits starting from a clean foundation, without inheriting biases from a general-purpose chat model.

Where can developers access Trinity-Large-Thinking?

The model is available on OpenRouter at $0.25 per million input tokens and $0.90 per million output tokens, with a 262,144-token context window. It is also available on DigitalOcean's Agentic Inference Cloud in public preview. Model weights are downloadable from Hugging Face under the Apache 2.0 license.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Vibe Coding

Harkaram Grewal

New Delhi

Maps the India–Germany–U.S. AI triangle from New Delhi. Background in cross-market operations and business development. Writes about supply chains, enterprise adoption, and talent—the unsexy forces that actually move global AI.

Arcee AI Releases 400B Open Reasoning Model That Rivals Claude at 96% Lower Cost

A $20 million bet from a 30-person team

Benchmarks that earn a second look

How the sparsity works

The thinking upgrade that fixes the preview's weakness

What comes next

Harkaram Grewal

Get the Morning Briefing in your inbox.

Related Stories

Google Releases Gemma 4 Under Apache 2.0, Dropping Its Custom AI License

Alibaba Releases Third Closed-Source AI Model in Three Days, Signals Profit-First Strategy

Open-Weights LLMs Score 94.8% on Custom Coding Benchmark, 4% Behind Claude Opus