Anthropic ships frontier-class coding at commodity prices

Haiku 4.5 matches May’s best at one-third the cost, pushing multi-agent systems over the economic line.

Anthropic released Claude Haiku 4.5 today, pricing it at $1 per million input tokens and $5 per million output tokens—roughly a third of Sonnet 4.5’s $3/$15. The company says Haiku 4.5 delivers coding and “computer use” performance comparable to Sonnet 4, which set the bar in May. It’s available immediately on Claude.ai, via Anthropic’s API, and through Amazon Bedrock and Google Cloud Vertex AI. For free users, Anthropic is making Haiku 4.5 the default in some cases, while keeping manual selection available to everyone.

Sonnet 4.5 shipped two weeks ago; Sonnet 4 arrived five months back. Anthropic now argues that last spring’s frontier capability has been compressed into a faster, cheaper small model. Haiku 4.5 runs more than twice as fast as Sonnet 4 on key tasks, with similar results on coding, computer use, and tool-calling. That speed changes what developers can afford to run in parallel. It also shifts where the margins live.

Key Takeaways

• Haiku 4.5 matches Sonnet 4's May performance at $1/$5 per million tokens—one-third Sonnet 4.5's cost and twice the speed

• Multi-agent orchestration crosses economic threshold: Sonnet 4.5 plans while parallel Haiku 4.5 instances execute at production costs

• Frontier-to-commodity compression accelerates: capabilities that defined state-of-the-art in May now cost 67% less five months later

• Opus 4.1 relegated to "legacy" status two months post-launch as Anthropic consolidates around Sonnet frontier and Haiku production tiers

What’s actually new

The delta isn’t capability; it’s cost at capability. Sonnet 4 was frontier-class in May. Five months later, Haiku 4.5 offers that intelligence band at a fraction of the price and latency. It’s also the first Haiku with hybrid reasoning and an optional “extended thinking” mode—previously a big-model perk. In Anthropic’s internal testing, Haiku 4.5 scored 73.3% on SWE-bench Verified, averaged over 50 trials with a 128K thinking budget and default sampling, plus a prompt addendum that nudges heavy tool use and test writing. Note the test conditions. They matter.

There’s a wrinkle on price. Haiku 4.5 costs more than Haiku 3.5’s $0.80/$4, even as Anthropic markets it as the economical choice. For teams moving up from Haiku 3.5, this is a capability upgrade that still invites a budget talk. For teams choosing between models, Haiku 4.5 undercuts Sonnet 4 by about 67% while matching its May-era coding performance. That’s the trade.

The inference-economics threshold

Anthropic’s architecture story centers on multi-agent systems: let Sonnet 4.5 plan, then spin up a pool of Haiku 4.5 workers to execute subtasks in parallel—refactors, migrations, data gathering, and more. The pattern wasn’t technically new. It was economically out of reach. Running multiple frontier instances in lockstep shreds API budgets and blows latency targets.

Haiku 4.5 changes that calculus. It is fast, inexpensive, and capable enough to make parallel execution viable for production. In practice, that unlocks customer-support agents, pair-programming loops, and real-time assistants where throughput and responsiveness dominate. For deep planning, Sonnet 4.5 remains necessary. The sweet spot is orchestration: Sonnet breaks the problem; Haiku handles the work. Anthropic is baking that pattern into Claude Code. It’s a portfolio play that sells both models without forcing a binary choice. Smart.

Competitors are chasing the same curve. OpenAI is pushing small models toward GPT-5-class targets, and Google’s Gemini 2.5 Flash aims at low-latency, cost-sensitive jobs. The pattern holds: frontier capability migrates down to smaller models in months. The only open question is whether the frontier advances faster than the cost curve compresses. Right now, compression is winning.

The model hierarchy dissolves

Anthropic’s neat tiering—Opus for reasoning, Sonnet for balance, Haiku for speed—is blurring. Opus 4.1, launched in August, now sits in Claude.ai as a “legacy brainstorming model.” Internally, the recommendation has shifted to Sonnet 4.5 for most tasks, with Haiku 4.5 as the production workhorse. The lines aren’t just fuzzy; they’re being redrawn mid-generation.

That shift reflects product reality. Maintaining three equally prominent tiers while rivals ship quickly spreads attention thin. Consolidating around a frontier tier (Sonnet) and a production tier (Haiku) simplifies the story and positions Opus as optional rather than central. It also raises an obvious question: if an Opus 4.5 lands, where does it live in this new map?

Safety adds another twist. Haiku 4.5 is rated at AI Safety Level 2 (ASL-2), a notch less restrictive than Sonnet 4.5 and Opus 4.1 at ASL-3. Anthropic’s internal testing reports lower rates of misaligned behavior for Haiku 4.5 than for its larger siblings, and “limited risks” on CBRN-related assessments. In other words, the smallest, fastest, cheapest model is—by Anthropic’s own metrics—its safest. That’s notable.

The energy signal

Inference—not training—is where costs scale. Every query, every user, forever. Smaller models consume far less energy per response than giant ones, and those physics drive margins. Illustrative figures cited in coverage show a 405-billion-parameter model drawing thousands of joules per query while an eight-billion-parameter model draws a tiny fraction of that. The exact numbers vary by setup, but the direction is clear. Size multiplies cost.

That’s why small models matter. AI companies are plotting vast data-center buildouts over the next two years. Those plans pencil out only if inference costs fall fast enough to support mass deployment. Haiku 4.5’s economics—and rival efforts to match it—suggest the industry expects “good enough, fast” models to carry the volume. Anthropic even markets Haiku 4.5 for free-tier experiences, where cost per user must approach zero. That’s where scale lives.

Why this matters

Frontier capabilities now commoditize in five months, shrinking the moat around raw performance and rewarding speed in deployment.
Multi-agent workflows cross the economic threshold, enabling parallel execution at production latencies and budgets

❓ Frequently Asked Questions

Q: What is "extended thinking mode" and why does it matter for Haiku?

A: Extended thinking mode allows the model to use additional tokens for internal reasoning before responding—similar to OpenAI's o1 approach. Haiku 4.5 is the first small Anthropic model with this capability, previously reserved for larger models. In benchmark tests, Anthropic used a 128K thinking budget, which lets Haiku 4.5 work through complex coding problems step-by-step rather than rushing to an answer.

Q: If Haiku 3.5 cost $0.80/$4, why did Haiku 4.5 pricing increase to $1/$5?

A: Haiku 4.5 delivers May's frontier-level performance (matching Sonnet 4) with hybrid reasoning capabilities that Haiku 3.5 lacked. The 25% price increase reflects this capability jump—you're paying slightly more but getting intelligence that would have cost $3/$15 via Sonnet 4 six months ago. For teams upgrading from Haiku 3.5, it's a performance trade requiring budget conversation.

Q: What does multi-agent orchestration actually look like in practice?

A: Sonnet 4.5 breaks down a complex task—say, refactoring a large codebase—into discrete subtasks with clear requirements. It then spawns multiple Haiku 4.5 instances to handle those subtasks in parallel: one refactors authentication logic, another updates database queries, a third rewrites API endpoints. Each Haiku worker costs a third of what Sonnet would and runs twice as fast, making parallel execution economically viable.

Q: What's the difference between ASL-2 and ASL-3 safety ratings?

A: AI Safety Levels measure catastrophic risk potential, particularly for CBRN (chemical, biological, radiological, nuclear) threats. ASL-3 models like Sonnet 4.5 and Opus 4.1 require stricter deployment controls and monitoring. Haiku 4.5's ASL-2 rating means it poses "limited risks" in safety testing and showed lower rates of misaligned behavior than larger models—making it Anthropic's least restrictive and, by their metrics, safest model.

Q: When should I choose Haiku 4.5 over Sonnet 4.5?

A: Choose Haiku 4.5 for latency-sensitive applications (customer support, pair programming, real-time assistants), high-volume deployments where cost matters, or as execution workers in multi-agent systems. Choose Sonnet 4.5 for complex reasoning tasks, multi-step planning, or when you need frontier performance. The optimal pattern: Sonnet plans, Haiku executes. Haiku 4.5 scored 73.3% on SWE-bench versus Sonnet 4.5's higher mark, but runs 2x faster at one-third the cost.