Claude Sonnet 5 vs Opus 4.8: Where It Wins, Where It Doesn't

Anthropic released Claude Sonnet 5 on June 30 with a simple pitch: most of Opus 4.8's capability at well under half the cost. On the company's own agent benchmarks, the model trails its flagship by a few points on coding and slips narrowly ahead on knowledge work. The pitch rests entirely on that pairing, near-Opus performance at a mid-tier price. Whether it pays off comes down to which tasks a team runs against the model, and to a few catches Anthropic files in its own disclosures.

Key Takeaways

Sonnet 5 comes within six points of Opus 4.8 on agentic coding (63.2% vs. 69.2%) at a fraction of the price.
The $2/$10 introductory rate expires September 1, 2026, resetting to $3/$15, a 50% increase.
It is the right default for high-volume, always-on agents; Opus 4.8 still owns the hardest coding and offensive-security work.
Test your own workload against both models before trusting the headline discount, since the new tokenizer and September reset both move the math.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Where Sonnet 5 is the right default

Anthropic and its early access partners point to the same kind of workload: agents that run constantly and cannot afford to stall. Daniel Shepard, a senior engineer at Zapier, handed the model a two-part job: update Salesforce account tiers, then send a launch announcement to enterprise contacts. It finished end to end. "That used to stall halfway," Shepard said. "For day-to-day automation, it's a no-brainer." Cursor co-founder Sualeh Asif said Sonnet 5 agents "stay on plan, follow our conventions, and ship clean multi-step changes, all at an efficient cost," and Cursor's own CursorBench score moved from 49% on Sonnet 4.6 to 57% on Sonnet 5. Zimu Li, a member of technical staff at Factory, called the model "a strong execution layer for multi-step software engineering work," singling out sustained coding, tool use, and debugging "across messy technical contexts" and workflows "where follow-through and technical grounding matter."

That reliability gain matters more than the raw benchmark gap for three categories of work. Long-horizon coding and refactors, the kind AWS says the model is "designed to navigate real codebases, land multi-file changes, and carry longer debugging and refactoring tasks through to completion," benefit from a model that finishes rather than one that scores marginally higher but stalls partway. Browser and terminal automation, the category AWS highlights for financial services clients running spreadsheet modeling and self-auditing reporting agents, rewards completion consistency over peak intelligence. And knowledge work, such as synthesizing a long research document into a brief, is the one category where Anthropic's own numbers put Sonnet 5 ahead of Opus 4.8 rather than behind it.

The token math, and where it actually saves money

A team paying Opus 4.8's $5 input and $25 output rate for an agent that does not need Opus-level judgment can move to Sonnet 5 at $2 and $10, a nominal cut of roughly 60% through August and closer to 40% once standard pricing takes effect in September. The saving is real for high-volume, low-judgment agent loops. Picture a customer-service bot that fires hundreds of times a day and kicks only the hardest cases up to Opus. That is the fit.

Teams migrating from Sonnet 4.6 face a catch here. The new tokenizer, shared with Opus 4.7, generates about 30% more tokens for the same input text, and as much as 1.35 times more on some content. Anthropic set the introductory price to absorb that increase relative to Sonnet 4.6. It did not calibrate against Opus 4.8, so the per-token comparison to Opus still holds. The introductory rate is also temporary. The $2/$10 pricing expires September 1, when input pricing rises 50% to $3 and output pricing rises 50% to $15, regardless of how usage changes.

Anthropic's own effort-level dial points to the test worth running. A team can run the same workload at both models and compare completion rate and total token spend against its real prompts rather than published benchmarks, then decide whether the mid-tier savings survive its specific pipeline.

Get Implicator.ai in your inbox

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

How it stacks up

Model	Price (in/out per 1M)	Best for	Weak point
Claude Sonnet 5	$2/$10 intro to Aug 31, then $3/$15	High-volume agents, long-horizon coding, knowledge work	Trails Opus on the hardest coding; capped by design on cyber tasks
Claude Opus 4.8	$5/$25	Highest-accuracy coding and judgment calls	More than double Sonnet 5's cost for agent loops that don't need it
Claude Sonnet 4.6	$3/$15	Existing deployments not yet migrated	58.1% agentic coding vs. Sonnet 5's 63.2%; no cyber safeguards
GPT-5.5	$5/$30	Teams already standardized on OpenAI's stack	Priciest of the group on output; ties Opus 4.8 on input
Gemini 3.1 Pro	Higher than Sonnet 5	Google Cloud-native workflows	Undercut on price by Sonnet 5's intro rate
Gemini 3.5 Flash	Below Sonnet 5	Cost-floor workloads where full agentic capability isn't needed	No published head-to-head benchmark against Sonnet 5 in current reporting

Where it falls short

Three limits show up in Anthropic's own disclosures, not in outside testing. Opus 4.8 still leads the hardest coding by six points on agentic coding and by a wider margin on Terminal-Bench 2.1, 82.7% to Sonnet 5's 80.4%. A team whose agent handles the top tenth of difficulty, the bugs that stump Sonnet 4.6 and Sonnet 5 alike, still needs Opus in the loop, at least as an escalation path.

Know someone who'd find this useful? ✉️ Email it to a friend in one click, or they can subscribe free here.

Dangerous cyber work is capped by design, even though the model still handles routine, non-harmful cyber tasks. Anthropic says it did not train Sonnet 5 on cyber tasks, and on a Firefox 147 exploit-development evaluation built with Mozilla, the model never produced a working exploit, a 0% score against Opus 4.8's 68.8% and Mythos 5's 88.4%. Sonnet 5 ships with the same real-time cybersecurity safeguards as Opus 4.7 and 4.8, and Anthropic says requests involving prohibited or high-risk cybersecurity topics may be refused. Anthropic built that ceiling deliberately, and its own guidance recommends Opus 4.8 instead for cybersecurity work that requires reduced guardrails.

The price has an expiration date baked in. The introductory rate expires September 1, and Priority Tier, an option available on other current Claude models, is not offered for Sonnet 5 at all. A migration planned entirely around June's headline price will look different on a September invoice.

The verdict

Sonnet 5 fits the agent that runs all day and mostly succeeds. Much of what is currently routed to Opus 4.8, a long refactor or an always-on reporting agent, it can absorb. The test is cheap. Swap the model string, point it at real prompts for a week, and check whether the completion rate holds before assuming the savings do. It is a poor fit for the hardest coding problems, for anything touching offensive security, and for any budget that treats the $2/$10 rate as permanent, since that rate expires September 1.

Frequently Asked Questions

What does Claude Sonnet 5 cost?

$2 per million input tokens and $10 per million output tokens through August 31, 2026, then $3 and $15 per million tokens after that, according to Anthropic.

Is Sonnet 5 better than Opus 4.8?

Not on the hardest coding and reasoning tasks, where Opus 4.8 leads by about six points on Anthropic's agentic-coding benchmark. Anthropic says Sonnet 5 slightly edges Opus on one knowledge-work benchmark, GDPval-AA v2.

Can Sonnet 5 be used for cybersecurity or offensive-security work?

Anthropic did not train it for cyber tasks, and it never produced a working exploit in a Firefox 147 test built with Mozilla. The company recommends Opus 4.8 instead for cybersecurity work that needs reduced guardrails.

Will Sonnet 5 stay this cheap?

No. The $2/$10 introductory rate is temporary and resets to $3/$15 per million tokens on September 1, 2026, a 50% increase on both input and output pricing.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

Sonnet 5 Closes Most of the Gap to Opus 4.8 on Agent Work

Where Sonnet 5 is the right default

The token math, and where it actually saves money

How it stacks up

Where it falls short

The verdict

Marcus Schuler

Get the Morning Briefing in your inbox.

Related Stories

Commerce Department Lifts Export Controls on Anthropic's Fable 5 and Mythos 5

Chamath Palihapitiya Takes CEO Role at 8090 Labs After $135M Raise

Samsung and SK Hynix Plan $590 Billion South Korea Chip Buildout