Claude Code’s quiet power play: tooling, not trophies

💡 TL;DR - The 30 Seconds Version

🔄 Anthropic adds checkpoints to Claude Code that auto-save before each edit, letting developers rewind with Escape twice or /rewind command—designed to sit alongside Git for safe autonomous refactoring.

💻 New VS Code extension (beta) brings inline diffs and sidebar visibility to Claude Code, moving the tool from terminal-only to IDE-native as Anthropic competes with Cursor's polished developer experience.

🤖 Claude Code SDK rebrands as Claude Agent SDK with subagents for parallel work, hooks for automated testing, and background tasks—expanding beyond coding to general agent development across domains.

📊 Context editing plus memory tool improved agent performance 39% in Anthropic's internal tests, reducing token consumption 84% in 100-turn workflows while enabling 30-hour autonomous runs without manual intervention.

⚡ Early tests show Sonnet 4.5 completes code reviews in two minutes versus GPT-5 Codex's ten minutes, but GPT-5 Codex handles complex production debugging more reliably—revealing specialization rather than dominance.

🏗️ Developer tooling becomes the real battleground as frontier models converge on capability—operational polish and deployment infrastructure now matter more than benchmark scores for enterprise adoption.

Benchmarks got the headlines. Checkpoints, a VS Code extension, and an agent SDK will decide who wins on real developer desks.

Anthropic’s newest coding push is less about topping a leaderboard and more about removing the sand in the gears. With Claude Sonnet 4.5, the company shipped a trio of practical upgrades—checkpoints, a native VS Code extension, and a rebranded Agent SDK—that make autonomous coding feel less like a demo and more like a workflow. For the record, Anthropic says Sonnet 4.5 can run hands-off for about 30 hours; the strategic bet is that Anthropic’s Claude Code upgrades will matter more day-to-day than the stat.

Checkpoints turn “let it run” into a sane default

Claude Code now snapshots your project before each agent edit. If a sweeping refactor goes sideways, you can rewind instantly—either the code, the conversation, or both—with a quick double-tap of Escape or the /rewind command. It’s rollback without ritual, meant to sit alongside Git rather than replace it. The point is confidence. You can assign bigger, riskier tasks because recovery is one keystroke away.

The new guardrail lands with a clear target audience: teams who want agents to touch real code but hate babysitting them. It reduces the human overhead that kept many “autonomous” sessions tethered to Slack. That’s the unlock.

⌨️ Claude Code 2.0 — New Commands & Shortcuts

/rewind (new)

Instantly restore a checkpoint. Choose what to roll back:

Code only — revert file changes
Conversation only — rewind chat context
Both — full restoration to a prior state

Tip: You can also trigger this with Esc + Esc

New shortcuts

Esc + Esc — Open checkpoint rewind
Ctrl + R — Search prompt history

Checkpointing Terminal 2.0 VS Code beta

IDE-native visibility, finally

A beta VS Code extension moves Claude Code from the terminal into the editor. You get a live sidebar with inline diffs of proposed changes, so reviewing and accepting edits feels like normal code review instead of CLI archaeology. Meanwhile, the refreshed terminal gains clearer status readouts and a searchable prompt history (Ctrl+r). Small things, but they compound in daily use.

This is catch-up with a purpose. Cursor-style experiences taught developers to expect agentic help where they actually work: inside the editor, with transparent diffs and click-to-apply controls. Anthropic doesn’t need to reinvent that wheel. It needs to make its wheel frictionless.

From “coding tool” to agent platform

Anthropic also renamed its Claude Code SDK to the Claude Agent SDK, and the parts under that hood tell the story. Subagents let a main worker spin up specialists—front-end here, API there—each with its own context. Hooks trigger tests, linters, or policy checks at the right moments. Background tasks keep long-running services alive without blocking the rest of the job. That’s orchestration, not vibes.

What does that buy you? Parallelism with permissions. You can structure work the way a senior engineer would: split responsibilities, gate changes with automated checks, and keep the dev server humming while other threads progress. It’s the difference between a clever assistant and a team lead that delegates.

Longer runs still need smarter context

Sonnet 4.5’s headline is stamina: Anthropic and third-party coverage highlight an uninterrupted 30-hour coding run that produced a Slack-like app. Endurance matters, but only if the agent can manage what it remembers. That’s where Anthropic’s broader platform moves—context editing to clear stale tool results and a file-backed memory tool for persistent notes—quietly become force multipliers, trimming token bloat and preserving the right breadcrumbs across long sessions. In Anthropic’s internal tests, combining the two lifted complex agent performance by 39 percent.

The upshot: long horizons without context thrash. Less manual pruning. Fewer “what were we doing again?” moments. It’s housekeeping for agents.

Where it helps—and where it doesn’t

These upgrades don’t make agents omniscient. Checkpoints won’t tell you if the refactor was a good idea. Inline diffs won’t ensure the migration script covers every edge case. The Agent SDK can wire in hooks for tests and policy, but someone still has to define those tests and policies. Tools reduce operational friction; they don’t replace engineering judgment. That distinction keeps teams out of the hype ditch.

And yes, raw capability still matters. But when top-tier models increasingly trade benchmark wins, the real moat shifts to deployment polish: the safety rails, review flows, and “just works” integrations that let a cautious org move faster without feeling reckless. Tooling is the wedge.

The competitive read

OpenAI, Google, and Microsoft have distribution advantages in editors and suites. Anthropic’s answer is to make the developer experience so smooth that switching costs tilt in its favor—even without owning the default IDE. A credible VS Code presence, reliable rollbacks, and a production-minded agent SDK are the right angles of attack. If the editor experience keeps improving, Claude Code stops being a separate lane and starts feeling native. That’s the adoption game.

Bottom line

This release is a mindset shift masquerading as a feature drop. Anthropic is signaling that the path to real-world coding agents runs through ergonomics and guardrails, not just bigger context windows and prettier charts. If you care about what ships on Monday morning, that matters more than a new high-water mark on a leaderboard.

Why this matters

Tooling, not just tuning, will decide winners. Checkpoints, IDE-native diffs, and an agent SDK lower the cost of trusting agents with real code.
Longer runs only pay off with context discipline. Context editing plus memory makes 30-hour agents practical instead of fragile.

❓ Frequently Asked Questions

Q: Do checkpoints work with my existing Git workflow, or do they replace version control?

A: Checkpoints complement Git rather than replace it. They only track Claude's edits—not your changes or bash commands—and reset automatically when you rewind. Anthropic expects developers to use both: checkpoints for quick rollback during agent sessions, Git for permanent version control. When rewinding, you choose whether to restore code, conversation, or both independently.

Q: What does "beta" mean for the VS Code extension—is it stable enough for production work?

A: Beta status signals Anthropic shipped for adoption over polish. Early users report the inline diffs work reliably, but the agent sometimes loses context when switching between terminal and IDE workflows. The extension feels additive—Claude operates in the IDE but maintains separate state from your active editing session. Expect rough edges but functional core features.

Q: Does the Claude Agent SDK require additional API costs beyond standard model pricing?

A: The SDK is free—developers pay only standard API rates of $3 input / $15 output per million tokens for Sonnet 4.5. Subagents, hooks, and background tasks run on the same pricing. Context editing and memory tools also carry no additional fees. The infrastructure is part of Anthropic's platform lock-in strategy, not a separate revenue stream.

Q: What does "client-side" memory tool mean for data security and privacy?

A: Client-side means memory files live on your infrastructure, not Anthropic's servers. Claude makes tool calls requesting memory operations; your application executes them locally. You control where data is stored, how it's persisted, and who can access it. For regulated industries handling sensitive code or documents, this keeps everything behind your firewall while still enabling persistent agent knowledge.

Q: Can I use Claude Code's checkpoints and SDK features with other AI models like GPT-5?

A: No—checkpoints, the VS Code extension, and the Agent SDK are proprietary to Anthropic's platform. They're designed specifically for Claude models and won't work with OpenAI, Google, or other providers. This creates switching costs: developers who build workflows around these tools face rebuilding context management and orchestration systems if they migrate to competitors.

The Best Investment OpenAI Made Last Year Wasn't in Compute. It Was a Check to MAGA Inc.

CES 2026's Biggest Story Isn't on the Show Floor

Yann LeCun Didn't Retire. He Escaped.