Anthropic shipped Claude Opus 4.6 last week. Strong reviews. Benchmark wins across the board. The kind of release that usually buys a frontier lab months of breathing room.
The breathing room lasted about seven days.
On Wednesday morning in Shanghai, MiniMax released M2.5. A Mixture of Experts model, 230 billion parameters total, only 10 billion active at any given time. On SWE-Bench Verified, the coding benchmark enterprise buyers care about most, M2.5 scored 80.2%. Opus 4.6 scores 80.8%. That gap is 0.6 percentage points. On two alternative test harnesses, Droid and OpenCode, M2.5 actually pulled ahead.
Then came the price sheet. Standard M2.5 costs $0.15 per million input tokens and $1.20 per million output. Opus 4.6 charges $5 and $25. Running M2.5 continuously for an hour at a hundred tokens per second costs one dollar. Running Opus for the same hour costs roughly twenty.
For the past year, the story of U.S.-China AI competition was about closing the capability gap. That gap is now a rounding error. The new contest is about cost. And if you know anything about how Chinese manufacturers have reshaped solar panels, electric vehicles, and consumer electronics over the past two decades, you already know how cost wars with China tend to end.
The Argument
• MiniMax M2.5 scores 80.2% on SWE-Bench Verified, within 0.6 points of Opus 4.6, at roughly 5% of the cost per million tokens.
• Total token cost: $1.35 for MiniMax versus $30 for Opus and $15.75 for GPT-5.2. The gap is 20x, not 2x.
• Western labs cannot match these prices without destroying the margins that fund their research operations and justify their valuations.
• Five major Chinese AI releases landed in a single week. The pricing pressure is structural, not promotional.
Five releases in five days
MiniMax did not ship in isolation. The same week, Zhipu AI launched GLM-5, a 744-billion-parameter model that sent its Hong Kong stock up nearly 30%. DeepSeek upgraded its flagship. Ant Group dropped Ming-Flash-Omni 2.0, a multimodal system handling speech, music, and video generation. ByteDance had already launched Seedance 2.0 the Monday prior. Five major Chinese AI releases in a single week.
Chinese Premier Li Qiang fanned the flames on Wednesday, calling for "scaled and commercialized application of AI" and better coordination of national computing resources. MiniMax stock jumped 13.7% in Hong Kong. Pure-play AI startups rallied across the board while Tencent and Alibaba, the established giants, fell 2.6% and 2.1% respectively.
Whether this was coordinated doesn't matter much. What matters is cadence. MiniMax alone has shipped three models in three and a half months, from M2 in late October to M2.5 now. In that span, its SWE-Bench Verified score climbed from 56% to 80.2%. Outpacing the improvement rate of Claude, GPT, and Gemini over the same period. That kind of velocity makes Western release cycles look stately. A year ago, nobody in San Francisco was worried about MiniMax. They are now.
But the benchmarks are almost beside the point. Futurum Group analyst Rolf Bulk put it plainly on CNBC. Chinese tech companies "have taken a relatively frugal approach to AI development, with far less on capital expenditures than their American rivals." That frugality is the real weapon. Not the model itself. The margin structure underneath it.
Frontier AI's generic drug moment
One number should make Anthropic anxious. Not the benchmark score. The price-to-performance ratio.
MiniMax M2.5 scores 80.2% on SWE-Bench Verified. Opus scores 80.8%. GPT-5.2 scores 80.0%. On BrowseComp, a web search and reasoning benchmark, MiniMax leads outright at 76.3%. On BFCL multi-turn tool calling, the test that measures how well a model handles agent workflows, MiniMax's 76.8% beats both Claude and GPT.
Now look at the prices. MiniMax's total cost per million tokens comes to $1.35. Opus runs $30. GPT-5.2 runs $15.75. Gemini 3 Pro sits at $14. Functionally identical performance, but one option costs a tenth to a twentieth of everything else on the board. That's not a rounding error. That's a different business.
Think about what happens in pharmaceuticals when a generic proves bioequivalence. The brand-name version doesn't slowly bleed market share. It craters. Not because customers abandon trust in the original, but because procurement departments open their spreadsheets, run the math, and pick the cheaper option. Every time. The FDA requires generics to prove bioequivalence. MiniMax just proved benchmark equivalence. No procurement team is paying 20 times more for 0.6 percentage points on a coding test.
MiniMax calls this "intelligence too cheap to meter," borrowing the famous broken promise from nuclear energy's early days. The difference is that MiniMax is delivering the numbers. One dollar per hour of continuous generation. Four agents running nonstop for a full year at $10,000 total. The equivalent workload on Opus would cost somewhere north of $200,000.
The engineering that makes the price stick
Western skeptics might wave this off as subsidized pricing or a loss leader. The engineering suggests otherwise.
M2.5 runs on a sparse Mixture of Experts architecture. Every query routes to a small slice of the model's total parameters. You get frontier-class reasoning from what amounts to a ten-billion-parameter compute budget. Anthropic and OpenAI run dense architectures that fire every parameter on every query. Dense models cost more to serve. Period.
On the training side, MiniMax built Forge, a proprietary reinforcement learning framework. MiniMax engineer Olive Song, sitting in front of a webcam for a ThursdAI podcast appearance, walked through the details. Forge trains across hundreds of thousands of real-world simulated environments where M2.5 practices coding, searching, and tool use by actually performing those tasks. Two months of training. A stabilization algorithm called CISPO that keeps the sparse model from flying apart at scale. And an "Architect Mindset" that emerged from the training, where M2.5 plans the structure and interface of a coding project before writing a single line.
The company eats its own cooking. 30% of all tasks inside MiniMax headquarters are completed by M2.5 right now. Not a pilot. Not a demo. Production use spanning R&D, sales, HR, and finance. 80% of newly committed code at the company is model-generated. That feedback loop, where operational use feeds the next training run, compounds faster than any lab running synthetic benchmarks in isolation can match.
More than 10,000 custom agent templates have already been built by users on MiniMax's platform. Finance teams use them for financial modeling. Legal departments use them for document review. Software teams use them across the full development lifecycle in over a dozen programming languages. The adoption curve is not theoretical.
Stay ahead of the curve
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
Why Western labs can't just match the price
This is where the competitive picture turns from uncomfortable to structurally bleak.
Western frontier labs spend billions on compute infrastructure every year. Their pricing assumes premium margins to justify those billions. Opus at $30 per million tokens isn't an arbitrary figure. It's the revenue line that pays for GPUs, safety teams, researchers, and the next generation of models. Cut it by 95% and you don't have a viable business. You have a research lab in urgent need of a new funding model.
Chinese labs operate under different cost structures. Lower headcount expenses. State-adjacent compute access. A domestic market of 1.4 billion potential users where aggressive pricing captures share that Western competitors cannot reach. JP Morgan's Tai Hui, speaking on CNBC during the stock rally, dismissed AI bubble talk as "a little premature," citing solid fundamentals behind the surge. Those fundamentals include structural cost advantages that have nothing to do with model quality.
What makes MiniMax dangerous isn't a better model. It's a comparable model at a fraction of the operating cost. That kind of advantage doesn't yield to better training recipes or longer GPU clusters. Ask anyone who competed against Chinese solar panel manufacturers how that went. Or EV makers. Or consumer electronics firms. Same script every time: squeeze the cost structure, ship something good enough, and grind until the incumbents fold. The playbook is decades old. The industry is brand new.
Anthropic's safety premium won't survive a 20x price gap
If you run AI agent workloads at scale, the conversation just shifted underneath you. You no longer need to agonize over picking the single best model. You need the one that delivers adequate performance per dollar. For most production workloads, that model now has a Chinese name on it.
Anthropic faces the sharpest pressure. Claude's value proposition has rested on capability combined with safety credentials and a reputation for responsible development. That story still carries weight. But it carries considerably less when the price gap is 20x rather than 2x. Enterprise procurement can absorb a modest premium for trust and compliance guarantees. A 2,000% premium demands a wholly different pitch, one that no amount of safety branding has yet produced.
OpenAI sits in a less exposed position at $15.75 per million tokens for GPT-5.2. Still more than ten times MiniMax's rate, but the gap is narrower and GPT's brand carries different weight in enterprise accounts. Even so, OpenAI's massive capital commitments leave less room to compress margins than Anthropic has. And if you think OpenAI can just absorb the loss, remember that the company is trying to justify a multi-hundred-billion-dollar valuation. Margin destruction is not part of that story.
By summer, a dozen Chinese models will likely occupy the same performance band at similar cost points. When that happens, the pricing pressure won't come from one startup in Shanghai. It will come from an entire bench of competitors whose cost floors are structurally lower than anything a San Francisco lab can match. The defensive moat that Western labs have relied on, capability leadership, just got a lot thinner.
Seven days from celebration to commodity
Anthropic's Opus celebration lasted a week. The benchmark lead is 0.6 percentage points. The price gap is 20x, running the wrong direction.
The generic drug playbook says brand-name prescriptions lose roughly 80% of their volume within a year of a generic hitting the market. AI model switching is faster than prescription switching. There is no doctor in the loop. No regulatory review for substitution. No patient reluctance to overcome. There is a dropdown menu in every orchestration framework and API gateway on earth. And on that dropdown, MiniMax just appeared at one-twentieth the price with functionally identical test scores.
The moat was never the model. It was the margin. MiniMax just proved the margin is optional.
FAQ
How does MiniMax M2.5 compare to Claude Opus 4.6 on coding benchmarks?
On SWE-Bench Verified, M2.5 scores 80.2% versus Opus 4.6's 80.8%, a gap of 0.6 percentage points. On two alternative coding harnesses, Droid and OpenCode, M2.5 actually scores higher. On BrowseComp and BFCL multi-turn tool calling, MiniMax leads both Claude and GPT outright.
Why is MiniMax M2.5 so much cheaper than Western frontier models?
M2.5 uses a sparse Mixture of Experts architecture that activates only 10 billion of its 230 billion total parameters per query. Western models like Opus and GPT run dense architectures firing every parameter on every query, requiring far more compute per token served.
What is MiniMax's Forge training framework?
Forge is a proprietary reinforcement learning system that trains M2.5 across hundreds of thousands of simulated real-world environments. The model practices coding, searching, and tool use by performing actual tasks. A stabilization algorithm called CISPO prevents the sparse architecture from degrading at scale.
Can Western AI labs just lower their prices to compete?
Not without breaking their business models. Opus at $30 per million tokens funds GPU infrastructure, safety research, and next-generation model development. Cutting prices by 95% would eliminate the revenue line that justifies billions in annual compute spending and multi-hundred-billion-dollar valuations.
What does this mean for enterprise buyers choosing AI models?
Enterprise procurement now faces a 20x price gap for functionally identical benchmark performance. Most production workloads, particularly agent-based systems running continuously, will gravitate toward the cheapest adequate option. The decision increasingly resembles generic drug substitution rather than premium brand selection.



