A developer's wife asked Claude one question about used Chevy Blazer EVs this week and immediately hit her session limit. One question. One answer. Locked out. She messaged her husband to ask if she was going crazy. She wasn't.

Across Reddit, GitHub, and X, Claude users have been reporting the same thing since March 23. Max subscribers paying $200 a month watched their usage meters jump from 21% to 100% on a single prompt. Max 5x users at $100 a month found their 5-hour sessions burning out in 90 minutes. Anthropic's own support chatbot went down during the chaos. Users felt blindsided. Paying customers, some of them handing over $200 a month, suddenly couldn't use the tool they'd built their workflows around.

The company finally confirmed the squeeze on Wednesday. Thariq Shihipar, a member of Anthropic's technical team, wrote that the company is "adjusting" session limits during peak hours, defined as weekdays from 5 a.m. to 11 a.m. Pacific. Your weekly cap stays the same, he said. You'll just burn through it faster when you actually need it.

The framing was classic demand management: shift your heavy work to off-peak hours, only 7% of users are affected, we're investing in efficiency. It read like an airline apologizing for overbooking. But the interesting question isn't why Anthropic is rationing seats. It's why the overbooking took this long to blow up.

Because when you run the numbers on what flat-rate AI subscriptions actually cost to serve, the answer is obvious. The math never worked.

Key Takeaways

Twenty dollars doesn't buy what you think it does

Anthropic doesn't publish what it spends to serve a single subscriber. But the company does publish API prices, and those prices offer a useful ceiling for estimating inference costs. API rates include compute, overhead, and margin. The actual cost to Anthropic sits below the listed price. So these calculations overstate the per-session cost. Even if Anthropic runs its models at half the API rate, the gap between what subscribers pay and what their sessions cost is too wide to ignore.

Sonnet 4.6 runs $3 per million input tokens and $15 per million output. Opus 4.6, the big model, costs $5 in and $25 out.

Take a Pro subscriber paying $20 a month. She uses Claude Sonnet for coding help, three hours a day during the workweek. Nothing extreme. Fifteen prompts per session. Each prompt ships around 50,000 tokens of code and chat history to the model. Claude fires back about 3,000. Nobody would call that heavy use.

Grab a napkin. Fifteen prompts at 50,000 input tokens each, at Sonnet's $3/million rate, costs $2.25 in input alone. The 3,000-token outputs add another $0.68. That's $2.93 for one day of conservative coding help.

Stretch that across a 20-day work month and the bill lands around $58.50 in raw inference, all for a subscriber who pays twenty bucks. Anthropic eats the other thirty-eight fifty. That money comes from the balance sheet, from venture investors, from somewhere. Almost three dollars burned for every dollar collected, on a user who isn't even pushing hard.

And that's Sonnet, the cheaper model.

The Opus problem multiplies everything

Now scale up. A Max 20x subscriber at $200 a month running Opus 4.6 for agentic Claude Code work. These sessions drag entire codebases into the context window. Prompts balloon to 150,000 input tokens. Outputs hit 8,000 tokens. Everything costs more because Opus is the expensive model, and the users who reach for it tend to push it the hardest.

Same napkin, bigger numbers. Thirty Opus prompts at 150,000 input tokens each, at $5/million, runs $22.50 on the input side. Eight-thousand-token outputs at $25/million add $6.00. Daily total: $28.50.

Weekday after weekday, that's $570 a month. The person writing the check pays $200. And this person closes the laptop when the workday ends. The ones who don't close the laptop are the ones Anthropic is really worried about.

The real outliers are worse. Anthropic has acknowledged that some users were running Claude Code continuously, 24 hours a day, automating pipelines and parallel sessions across projects. At even moderate Opus throughput, that's $300 a day in inference compute. Nine thousand dollars a month. On a $200 plan.

Anthropic put it bluntly when it introduced weekly caps last August: some subscribers were consuming "tens of thousands of dollars in model usage" against their flat-rate subscriptions. The company framed the limits as abuse prevention. But abuse is a strong word for passengers who simply filled the seat they paid for. They were just using too much of it.

The subscription trap every AI company built

This isn't unique to Anthropic. The entire AI subscription model carries the same structural flaw, and every major provider looks anxious about it. Flat monthly fees assume a predictable distribution of usage. Light users subsidize heavy users. The problem is that AI workloads don't distribute the way Netflix streams do. A casual user who asks Claude two questions a day costs almost nothing to serve. A developer running agentic coding loops burns 100x the compute. And the gap between those two profiles keeps widening as the tools get more capable.

OpenAI's books tell the same story at larger scale. The company reported $3.7 billion in revenue for 2024 against roughly $5 billion in total losses. Revenue has grown fast since then, past $25 billion annualized by early 2026. But the cost structure hasn't flipped. Google researchers David Patterson and Xiaoyu Ma published a paper in early 2026 identifying memory bandwidth and interconnect constraints in LLM inference as core technical bottlenecks, the kind of hardware-level problems that don't disappear just because revenue is climbing.

Inference costs have collapsed on a per-unit basis. Cheap, actually. Absurdly cheap. Stanford's AI Index put the number at 280x, the cost collapse for GPT-3.5-level performance from late 2022 to late 2024. GPUs lose about 30% of their sticker price annually on top of that. But consumption has grown faster than prices have dropped. Agentic workflows hit a model 10 to 20 times per task. Long-context sessions feed hundreds of thousands of tokens with every prompt. Always-on monitoring agents consume compute around the clock without a human in the loop.

Anthropic's annualized revenue has surged past $19 billion as of March 2026, according to Bloomberg, up from $9 billion at the end of 2025. That figure counts gross revenue before hyperscaler revenue-sharing. But even the net figure represents explosive growth, and growth doesn't help when each new subscriber arrives with a negative unit margin. You can't scale your way to profitability by adding customers who cost you three times what they pay.

The Pentagon effect made it worse, faster

The timing of this rate-limiting decision wasn't random. Anthropic signed a $200 million contract with the Pentagon in July 2025, but talks collapsed in September when the two sides couldn't agree on deployment guardrails. The Defense Department designated Anthropic a supply chain risk. A federal judge stepped in Thursday, granting a preliminary injunction that blocked the designation and a broader agency ban. Then she paused her own ruling for seven days so the government could appeal.

The fallout did something Anthropic's marketing never could. ChatGPT uninstalls spiked 295% day over day on February 28, according to TechCrunch and Sensor Tower data. Claude knocked ChatGPT off the top of the U.S. App Store. First time that had ever happened.

All of those new accounts hit the same GPU clusters, the ones that were already running warm before the Pentagon mess even started. A weekend of press coverage can sign up 100,000 people. Getting 100,000 GPUs online to serve them takes months, plus Nvidia purchase orders that were probably already spoken for. Demand outran compute. So something had to give. Slower responses, lower quality, or tighter limits. Anthropic picked limits. Probably the least bad option available to them.

The off-peak doubling promotion that ran from March 13 to March 28 looks different in retrospect. It wasn't generosity. It was demand shaping. Moving users to nighttime hours before announcing that daytime hours would cost more.

Where this lands in 12 months

Hours after Anthropic announced its throttling, OpenAI's Codex engineering lead Thibault Sottiaux posted on X: "We have reset Codex usage limits across all plans. You can just build unlimited things with Codex. Have fun!" The post pulled 791,000 views. Not a coincidence. But as Gizmodo's AJ Dellinger noted, OpenAI will maintain unlimited access until it captures enough users to start monetizing them, at which point another company steps in. The cycle repeats until the bill comes due.

OPINION: Anthropic Knew the Math. It Sold the Tickets Anyway.
Anthropic throttled Claude sessions during peak hours without warning. A $20 subscriber costs nearly $60 in inference, and power users burn far more.

Analysts at Avasant compared the pattern to the early days of cloud computing, when AWS, Azure, and Google Cloud faced identical constraints. The answer then was reserved capacity pricing, tiered access, and consumption-based billing. That's where AI subscriptions are heading now.

The $20 all-you-can-eat Claude plan will be remembered the way we remember unlimited data on early smartphone contracts. An overbooked flight that kept selling tickets until the plane couldn't take off. Pareekh Jain, principal analyst at Pareekh Consulting, put it directly: "Since all major vendors are either introducing or will introduce similar constraints, impacted users may not get relief by moving to another vendor platform."

You won't escape this by switching to ChatGPT or Gemini. The math is the same everywhere. The only variable is how long each company can afford to hide it.

Frequently Asked Questions

Why did Anthropic start throttling Claude during peak hours?

Anthropic is adjusting 5-hour session limits during weekdays 5-11 a.m. PT to manage compute demand that outpaced GPU capacity. Weekly limits stay unchanged, but sessions deplete faster during business hours. About 7% of users will hit limits they didn't before.

How much does a Claude session actually cost Anthropic?

Based on published API rates, a moderate Pro subscriber generates about $2.93/day in inference costs. A heavy Opus user costs roughly $28.50/day. API rates include margin, so actual costs are lower, but even conservative estimates show subscriptions don't cover compute.

Is OpenAI offering unlimited access to compete with Claude?

OpenAI removed Codex caps the same day Anthropic announced throttling. Analysts expect OpenAI to maintain unlimited access until it captures enough market share, then introduce similar constraints. Every provider faces identical inference economics.

What caused the sudden spike in Claude demand?

After Anthropic refused a Pentagon contract in late February, ChatGPT uninstalls surged 295% in one day. Claude hit #1 on the U.S. App Store. The user influx overwhelmed existing GPU infrastructure.

Will AI subscriptions always have usage limits?

Industry analysts compare the moment to early cloud computing, when AWS and Azure introduced reserved capacity and consumption-based billing. Flat-rate AI subscriptions are likely transitioning to metered or tiered models across all providers.

Trump Tells Tech Companies to Build Own Power Plants for AI Data Centers
President Donald Trump announced a "ratepayer protection pledge" during his State of the Union address on Tuesday, telling major technology companies they must provide their own electricity for data c
$250 a Month. No Warning. No Access.
San Francisco | Monday, February 23, 2026 Google restricted AI Ultra accounts, cutting off subscribers who accessed Gemini through OpenClaw's OAuth client. No warning, no explanation, no way to reach
Brave Drops Free Search API Tier, Puts All Developers on Metered Billing
Brave removed its free Search API tier on Thursday, replacing the zero-cost plan available since May 2023 with a credit-based billing system that charges $5 per thousand requests, according to the com
AI News
Maria Garcia

Maria Garcia

Los Angeles

Bilingual tech journalist slicing through AI noise at implicator.ai. Decodes digital culture with a ruthless Gen Z lens—fast, sharp, relentlessly curious. Bridges Silicon Valley's marble boardrooms, hunting who tech really serves.