Cursor Ships Composer 2, Beats Opus 4.6 at 10x Less

Cursor on Thursday released Composer 2, the third generation of its in-house coding model, scoring 61.7% on Terminal-Bench 2.0 against Anthropic's Claude Opus 4.6 at 58.0%, Bloomberg reported. The model costs $0.50 per million input tokens and $2.50 per million output tokens in standard mode. Opus 4.6 runs at $5 and $25 for the same volumes. Ten times the price.

The release landed on the same day OpenAI announced its acquisition of Astral, the company behind Python tools Ruff and uv, folding the team into its Codex coding platform. Two moves in a single Thursday. Both companies said the same thing without saying it directly. AI-assisted coding has moved past feature wars. This is an infrastructure land grab.

The Breakdown

Cursor's Composer 2 scores 61.7% on Terminal-Bench 2.0, beating Opus 4.6 (58.0%) at one-tenth the token cost
Self-summarization, a new RL technique, cuts context compaction errors by 50% while using one-fifth the tokens
OpenAI acquires Python toolmaker Astral (Ruff, uv, ty) for Codex, now at 2 million weekly active users
Anthropic captures 73% of first-time enterprise AI spending, up from 50% in January, per Ramp data

Three models in five months

Composer's trajectory compresses what normally takes years into a sprint. Cursor shipped the original Composer alongside its 2.0 platform redesign in October 2025. Composer 1.5 followed in February, still trailing Opus 4.6 by 10 percentage points on Terminal-Bench 2.0. Five months and three model generations later, Cursor has flipped that deficit into a 3.7-point lead.

Previous Composer versions applied reinforcement learning on top of an existing base model without touching the base weights. Composer 2 is the first version where the team ran continuous pre-training, building what Cursor calls "a far stronger base to scale our reinforcement learning."

Cursor co-founder Aman Sanger, who leads the company's research team from San Francisco, told Bloomberg the company trained Composer 2 solely on coding-related data. That specialization allowed a smaller, cheaper model.

"It won't help you do your taxes," Sanger said. "It won't be able to write poems."

Anthropic and OpenAI build general-purpose models that happen to code well. Cursor built a coding model that does nothing else. The company now has more than 1 million daily users, 50,000 business customers including Stripe and Figma, and is in talks for a roughly $50 billion valuation. That number would have sounded absurd for a code editor two years ago. The AI coding market has grown past the point where it sounds absurd now.

Teaching a model to forget on purpose

The technical bet behind Composer 2 is a training technique Cursor calls self-summarization.

Agentic coding generates enormous action histories. A model explores files, writes code, runs tests, backtracks, tries again. Those trajectories blow past context windows fast. Most systems handle this with compaction. Either prompted summarization or a sliding window that drops older context. Both approaches lose information. Critical details vanish mid-task.

Cursor's method builds summarization directly into the reinforcement learning loop. When Composer hits a fixed token-length trigger, it pauses, compresses its own context into roughly 1,000 tokens, then continues working from the condensed version. The training reward covers the entire chain, including those summaries. Poor summaries that lost critical information get downweighted. Good ones get reinforced. The model learns what to keep and what to throw away.

Self-summarization reduces compaction errors by 50% compared to a heavily tuned prompt-based baseline, according to Cursor's research post, while using one-fifth of the tokens. In one test, Composer worked through 170 turns on a Terminal-Bench problem called make-doom-for-mips, compressing more than 100,000 tokens down to 1,000. Several frontier models failed the same problem outright.

Join 10,000+ AI professionals

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Composer 2 still sits well behind OpenAI's GPT-5.4 at 75.1% on Terminal-Bench 2.0. The gap is real. But GPT-5.4 charges $2.50/$15 per million tokens, five times more than Composer's standard mode. For teams running thousands of agent sessions daily, that cost difference compounds.

OpenAI responds by buying the toolchain

Hours after Cursor's announcement, OpenAI revealed it would acquire Astral, the company behind uv, Ruff, and ty, three Python developer tools with hundreds of millions of monthly downloads. The Astral team will join Codex, OpenAI's coding platform that has grown to more than 2 million weekly active users since its launch, tripling since January.

Charlie Marsh, Astral's founder, framed the deal as acceleration. "If our goal is to make programming more productive, then building at the frontier of AI and software feels like the highest-leverage thing we can do," he wrote on the company's blog.

Astral is the latest in a string of acquisitions. OpenAI bought AI security startup Promptfoo earlier this month, Software Applications Inc. and Neptune late last year. Each purchase adds infrastructure that makes Codex stickier. Codex lead Thibault Sottiaux said the goal is a platform capable of "working across the entire software developer lifecycle," not just generating code.

For Python developers who depend on Ruff for linting and uv for dependency management, the deal introduces a familiar anxiety. Tools you chose for speed and reliability now belong to an AI company with its own product roadmap. OpenAI promised to keep the tools open source. Promises in acquisition announcements carry the weight you'd expect.

The enterprise numbers behind the urgency

Underneath the product launches, the market has already moved. Anthropic now captures 73% of all spending among companies buying AI tools for the first time, according to Ramp data reported by Axios. In January, Anthropic and OpenAI were tied at 50%.

Anthropic got there with two products. Claude Code for developers, Claude Cowork for desktop agents. OpenAI shipped Codex, Atlas, Sora, Operator, ChatGPT Agent, Whisper, ChatGPT Search, Pulse, Prism, and ChatGPT Health over the same period. Fidji Simo, OpenAI's head of applications, told staff last week to stop chasing "side quests" and focus on coding and enterprise, The Wall Street Journal reported.

Cursor occupies an unusual position in the middle. It uses models from both Anthropic and OpenAI, counts OpenAI as an investor, and now competes with both on model quality. The company that popularized what Andrej Karpathy named "vibe coding" is betting it can outperform its own suppliers on the one task that matters most to its users.

The three-way race between Cursor, Anthropic, and OpenAI is not about who builds the best chatbot. It is about who controls the surface where professional software gets written. Cursor ships a cheaper, specialized model. OpenAI buys the Python toolchain. Anthropic captures enterprise budgets with fewer, sharper products.

Nobody plans to settle for just one. The consolidation is just getting started.

Frequently Asked Questions

What is self-summarization and how does it work?

Self-summarization is Cursor's technique for handling long coding tasks. When Composer hits a token limit, it pauses and compresses its own context to about 1,000 tokens. Because the reinforcement learning reward covers the entire chain including summaries, the model learns which details matter. Cursor says this cuts compaction errors by 50% compared to prompt-based methods.

How does Composer 2's pricing compare to competitors?

Composer 2 costs $0.50/$2.50 per million input/output tokens in standard mode, with a 3x fast mode at $1.50/$7.50. Anthropic's Opus 4.6 charges $5/$25, ten times more. OpenAI's GPT-5.4 runs $2.50/$15, five times more than Composer's standard rate.

What is Terminal-Bench 2.0?

Terminal-Bench 2.0 measures how well AI agents handle real-world software engineering tasks in a terminal environment. Composer 2 scores 61.7%, Opus 4.6 scores 58.0%, and GPT-5.4 leads at 75.1%. Cursor also uses CursorBench, its proprietary internal benchmark suite.

What does OpenAI's Astral acquisition mean for open-source Python tools?

Astral built uv for dependency management, Ruff for linting and formatting, and ty for type safety, all used by millions of developers. OpenAI says it will keep the tools open source after closing. The Astral team joins Codex to integrate developer tools with AI coding workflows. The deal is pending regulatory approval.

Why is Cursor valued at $50 billion?

Cursor has over 1 million daily users and 50,000 business customers including Stripe and Figma. The company popularized vibe coding and ships its own AI models while also supporting third-party models from Anthropic and OpenAI. Bloomberg reported the $50 billion valuation talks this month.

Vibe Coding

Marcus Schuler

San Francisco

Tech translator with German roots who fled to Silicon Valley chaos. Decodes startup noise from San Francisco. Launched implicator.ai to slice through AI's daily madness—crisp, clear, with Teutonic precision and sarcasm. E-Mail: [email protected]

Cursor Ships Composer 2, Beating Anthropic's Opus 4.6 at One-Tenth the Token Cost

Three models in five months

Teaching a model to forget on purpose

OpenAI responds by buying the toolchain

The enterprise numbers behind the urgency

Marcus Schuler

Get stories like this in your inbox.

Related Stories

15 CLI Tools That Make AI-Assisted Coding in the Terminal Actually Work

Home Labs Make OpenClaw Unstoppable. That's Exactly the Problem.

Anthropic Launches Scheduled Tasks for Claude Code, but Your Laptop Has to Stay On