Anthropic Pins Claude Code Quality Drop on Three Changes

Anthropic said Thursday that three product-layer changes shipped between March and April degraded Claude Code, closing out weeks of user complaints and public pushback from company staff. In an engineering postmortem, the company traced the drop to a March 4 reasoning-effort downgrade, a March 26 caching bug that cleared thinking blocks on every turn instead of once, and an April 16 verbosity instruction that later showed a 3% hit on one coding evaluation. Anthropic said it resolved all three issues by April 20 in v2.1.116 and reset usage limits for every subscriber as of April 23.

Key Takeaways

Anthropic traced weeks of Claude Code complaints to three product-layer changes, not model regressions
A caching bug cleared thinking blocks every turn instead of once, making Claude forget mid-session
A verbosity-limiting system prompt line cut coding-eval scores by 3% for Opus 4.6 and 4.7
All fixes shipped by April 20 in v2.1.116; Anthropic reset usage limits for every subscriber

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Three bugs, overlapping on purpose

On March 4, Anthropic changed Claude Code's default reasoning effort from high to medium to cut tail latencies that occasionally made the UI appear frozen. Internal evals said intelligence would drop only "slightly." Users said otherwise. The company reverted on April 7 and now defaults Opus 4.7 to xhigh and every other model to high.

A separate shipment on March 26 turned out to be the most damaging. Cherny explained on Hacker News that resuming a long-idle session meant a full cache miss, so engineers wanted to prune old thinking blocks from sessions idle for more than an hour, using the clear_thinking_20251015 header with keep:1. The optimization was supposed to run once per stale session. A bug caused it to fire every turn for the rest of the session. Claude kept executing tool calls, but with progressively less memory of why it had made them. Anthropic said it believes this drove separate reports of rate limits draining faster than expected, because each pruned request produced a cache miss. Fixed April 10 in v2.1.101.

The last regression arrived with Opus 4.7 on April 16. To curb the new model's verbosity, Anthropic added one line to the Claude Code system prompt: "Length limits: keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail." Broader ablation testing during the quality investigation later showed that one line cut coding-eval performance by about 3% on both Opus 4.6 and 4.7. Reverted April 20.

A forensic case from AMD opened the door

A major public flashpoint came on April 2, when Stella Laurenzo, identified in VentureBeat and Business Insider reporting as a senior director in AMD's AI group, filed a GitHub issue arguing that Claude Code could no longer be trusted for complex engineering work. Her evidence was not a vibe check. She pulled 6,852 session files from her team's production logs. Those logs captured 234,760 tool calls and 17,871 thinking blocks. Files read before editing had dropped from 6.6 to 2.0. Median visible thinking length had collapsed from roughly 2,200 characters to about 600.

Claude Code lead Boris Cherny thanked her for the analysis but disputed her main conclusion, pointing to a UI-only change and the earlier medium-effort default. He was blunter on X, reportedly replying "This is false" to accusations that Anthropic had secretly nerfed the model. Three weeks later, Anthropic's postmortem confirmed that a real regression had shipped, though not the specific mechanism Laurenzo identified. Cherny later wrote in a follow-up X thread, per coverage from Kingy.ai, that "this has probably been the most complex investigation we've had" and that "the root causes were not obvious, and there were many confounders."

What engineering review missed

The caching bug slipped past multiple human and automated code reviews, unit tests, end-to-end tests, and internal dogfooding. A separate internal experiment that changed how thinking was displayed suppressed the bug in most CLI sessions Anthropic's own engineers used. And the corner case required a session to cross the one-hour idle threshold once, which rarely happened in short test runs.

Track AI product changes before they hit your workflow

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

When Anthropic back-tested its enterprise Code Review tool against the offending pull requests, Opus 4.7 flagged the defect. Opus 4.6 did not. Anthropic sells AI code review as a coding-assistant feature, and its own pipeline missed a bug that its newer model found in retrospect.

The trust debt

The bigger cost is trust. For most of April, named Anthropic staff publicly disputed claims of deliberate degradation and pointed users to UI changes or effort-setting explanations, while developers kept posting telemetry that suggested a broader product regression. Claude Code is already large enough that even small regressions can compound quickly across a paying customer base. Users felt blindsided, not because bugs happened, but because the official line from Cherny and others had been that the redaction header was UI-only and the evals looked clean, a framing that held right up until the company admitted three separate regressions had shipped.

That pattern repeats elsewhere on the Claude surface this month. On April 21, Anthropic briefly pulled Claude Code from its $20 Pro plan in a small pricing test that Ars Technica reported had landed on roughly 2% of new prosumer signups while the public pricing page was updated for everyone. Head of growth Amol Avasare reversed the page change within a day and apologized for the confusion. The two episodes share the same pattern: communication lags that leave paying users to discover product changes through social media.

What changes next

Every Claude Code system prompt change now runs through a full per-model eval sweep with ablations, gated by new audit tooling, and model-specific instructions must be scoped in CLAUDE.md. Future changes that trade against intelligence, whether for speed or cost, ship with a soak period and a gradual rollout. More Anthropic staff will move onto the exact public build of Claude Code. You can follow the new @ClaudeDevs account on X for the explanation work that has been falling to scattered employee replies.

Users will judge the next regression on two clocks: how fast Anthropic spots it, and how fast it says so in public.

Frequently Asked Questions

What caused the Claude Code quality drop?

Three separate product-layer changes. Anthropic lowered the default reasoning effort from high to medium on March 4, shipped a caching bug on March 26 that cleared thinking blocks every turn instead of once per stale session, and added a verbosity-limiting system prompt on April 16 that cut coding-eval scores by about 3%.

Was the underlying Claude model changed?

No. Anthropic said its API and inference layer were unaffected throughout the period. The regressions came from Claude Code's harness and the system prompts surrounding the models, not from the model weights. The issues hit Sonnet 4.6, Opus 4.6, and Opus 4.7 across different dates.

When were the three issues fixed?

Anthropic reverted the reasoning-effort default on April 7. It patched the caching bug on April 10 in version 2.1.101. The verbosity-limiting system prompt was removed on April 20 in version 2.1.116. The company also reset usage limits for every subscriber as of April 23.

Who first flagged the regression publicly?

AMD senior director Stella Laurenzo filed a GitHub issue on April 2 arguing Claude Code could no longer be trusted for complex engineering. She backed it with telemetry from 6,852 session files, 234,760 tool calls, and 17,871 thinking blocks. Files read before editing had fallen from 6.6 to 2.0.

What is Anthropic changing to prevent a repeat?

Every Claude Code system prompt change now runs through a per-model eval sweep with ablations, gated by new audit tooling. Model-specific instructions are scoped in CLAUDE.md. Future intelligence tradeoffs ship with a soak period and gradual rollout. A new @ClaudeDevs account on X centralizes product explanations.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

Anthropic Traces Claude Code Quality Drop to Three Product Changes, Resets Limits

Three bugs, overlapping on purpose

A forensic case from AMD opened the door

What engineering review missed

The trust debt

What changes next

Marcus Schuler

Get the Morning Briefing in your inbox.

Related Stories

Google Signs $30 Billion SpaceX Compute Deal Ahead of IPO

Anthropic Embeds Engineers in the NSA to Deploy Mythos for Offensive Cyber

OpenAI will expand ChatGPT memory to free users after 5x compute cut