Google's AI shopping features promise convenience: automated tracking, inventory calls, checkout. But they remove everyone between you and purchase—reviewers, influencers, store staff. What's lost when one company owns the entire shopping journey?
Cursor raised $2.3 billion at a $29.3 billion valuation while paying billions to the same AI companies now competing against it. The fastest-growing startup in tech history faces a choice: become a model company or accept structural disadvantage.
Chinese hackers automated 80-90% of cyber intrusions using Anthropic's Claude by simply telling it they were security testers. Four breaches succeeded. The jailbreak was embarrassingly simple, and now every AI company faces the same vulnerability.
Google’s ‘quadrillion tokens’ brag hides a slower story
Google's 1.3 quadrillion token milestone sounds massive—until you see growth rates halving and realize tokens measure server intensity, not customer demand. The slowdown reveals something uncomfortable about AI economics.
Google says its AI now chews through 1.3 quadrillion tokens a month; growth, however, is already decelerating and the metric itself measures compute burn more than customer uptake.
What’s actually new
We now have three clean data points from Google: ~480 trillion tokens in May, ~980 trillion in July, and 1.3 quadrillion in October. The absolute jump is large, but the rate of increase has halved—from roughly +250 trillion per month in spring to about +107 trillion per month since July. That’s the tension behind the headline.
The company bundled the token milestone with the launch of Gemini Enterprise, an “AI front door” for workplace agents and data. That framing suggests surging adoption, yet tokens are primarily a unit of computation: slices of text, audio, or pixels the model ingests and emits. They are not users, seats, or solved workflows.
The Breakdown
• Google's token growth halved from 250 trillion monthly in spring to 107 trillion since July despite hitting 1.3 quadrillion total
• Tokens measure computational load per request, not user activity—reasoning models burn 17x more tokens than earlier versions
• Environmental claims use light-prompt medians that exclude heavy reasoning, multimodal, and agent tasks driving actual token growth
• Watch how aggressively Google steers customers to lite models and caps context windows—that reveals true unit economics
The substance behind the spin
Reasoning models inflate token counts because they think more internally. As providers add chain-of-thought-style steps, larger contexts, and multimodal inputs, each request consumes extra tokens even when user activity stays flat. That’s why token totals can soar while growth cools and revenue lags. Analysts at The Decoder made the same point this week, calling the figure “window dressing” for back-end scale. They’re right about the direction of travel.
Pricing changes reinforce the picture. Google’s own updates this summer simplified 2.5 Flash pricing and removed separate “thinking vs. non-thinking” rates—an implicit admission that internal reasoning tokens are a real cost driver that customers notice. Efficiency tweaks can trim invoices; they do not change what tokens fundamentally represent.
Evidence, not vibes
Google’s blog post touting Gemini Enterprise also includes the quadrillion-token milestone, tying it to AI permeating Cloud, Search, and Workspace. Take that as confirmation of deployment breadth, not proof of clear ROI. Tokens prove that systems are busy, not that they deliver dependable outcomes at an acceptable unit cost. The deceleration visible in the May→July→October series is the tell.
On environmental claims, the gap is wider. Google’s technical paper pegs a median Gemini text prompt at 0.24 Wh of electricity, 0.03 g of CO₂e, and 0.26 ml of water—numbers that sound tiny because they measure the lightest, shortest prompts. The same write-up sidesteps heavier cases like long-context analysis, video, and agentic browsing, which are exactly the scenarios pushing token totals higher. Critics flagged the scope issue within hours. The frame to use
Think of tokens as server-side intensity, not end-user demand. A quadrillion tokens can reflect three forces at once: more users, more work per request, and more modalities per session. Only the first is durable demand. The other two are architecture choices and product bets that can reverse as vendors de-emphasize expensive reasoning, turn on caching, or push cheaper “lite” models for routine tasks. Watch how often Google steers enterprises to Flash-class models and how aggressively it caps context windows and agent steps by default. Those settings reveal the true economics.
Limits and caveats
None of this means the milestone is meaningless. It proves Google can provision capacity at global scale and keep latency tolerable while moving customers into agent workflows. But the slowdown in monthly additions suggests either supply constraints, deliberate throttling for unit-economics, or simple maturation after a launch burst. Efficiency gains will also push the token curve down without any demand change. Read the chart accordingly.
The test ahead
The key question is whether Google can translate server-side intensity into net-new revenue per employee for its customers. If Gemini Enterprise consistently replaces license seats, call minutes, or analyst hours, decelerating token growth can coexist with rising sales. If not, a quadrillion tokens is just a very large power bill expressed in syllables.
Why this matters:
Token inflation masks demand. Counting compute shards, not customers, risks overstating momentum and understating cost discipline.
Climate math hinges on workload mix. Light-prompt medians undercount heavy reasoning, long context, and multimodal jobs—the very traffic now driving Google’s totals.
❓ Frequently Asked Questions
Q: What exactly is a token in AI terms?
A: A token is the smallest unit an AI model processes—roughly word fragments or syllables. "Hello world" might be three tokens. Models break text, images, audio, and video into tokens to process them. More complex inputs and reasoning steps create more tokens, even for the same user request.
Q: How much does processing 1.3 quadrillion tokens actually cost Google?
A: Google doesn't disclose internal costs, but industry estimates suggest large-scale AI inference costs $0.10-$0.50 per million tokens depending on model complexity. At the low end, 1.3 quadrillion tokens could represent $130-650 million monthly in compute costs—before accounting for data center overhead, power, and cooling.
Q: Why would Google deliberately throttle growth if they're bragging about scale?
A: Unit economics don't work at scale if compute costs exceed revenue per user. Google may be rate-limiting free users or capping expensive features because reasoning models burn tokens faster than customers generate revenue. The company acknowledged capacity constraints through Q4 2025 in earnings calls, suggesting supply limits matter too.
Q: What makes reasoning models use 17 times more tokens than regular models?
A: Reasoning models perform internal chain-of-thought steps before answering, generating tokens users never see. A simple question might trigger dozens of intermediate calculations, self-corrections, and verification steps. These "thinking tokens" drive up processing costs even when the visible answer stays short.
Q: Are other AI companies reporting similar token growth slowdowns?
A: Most competitors don't publish token metrics publicly, making direct comparisons impossible. OpenAI, Anthropic, and Microsoft report usage in vague terms like "queries" or "active users" rather than computational load. Google's transparency on tokens is unusual—which makes the visible deceleration from 250 trillion to 107 trillion monthly growth particularly telling.
Tech journalist. Lives in Marin County, north of San Francisco. Got his start writing for his high school newspaper. When not covering tech trends, he's swimming laps, gaming on PS4, or vibe coding through the night.
Google's AI shopping features promise convenience: automated tracking, inventory calls, checkout. But they remove everyone between you and purchase—reviewers, influencers, store staff. What's lost when one company owns the entire shopping journey?
Microsoft led AI infrastructure with 60% market share in 2023, then froze 3.5 gigawatts of planned capacity. OpenAI signed over $400 billion with Oracle instead. Nadella calls it strategic repositioning. SemiAnalysis data suggests Microsoft blinked first.
OpenAI released GPT-5.1 without publishing benchmarks, leading instead with "warmer" personality controls. The shift reveals how 800 million users push the company toward engagement metrics over technical capability, mirroring social media's path.
SoftBank just posted record profits driven by OpenAI's soaring valuation. So why is Masayoshi Son selling $15 billion in assets, expanding margin loans, and converting paper gains into cash? The answer reveals uncomfortable truths about AI's peak.