Google Cloud on Wednesday unveiled two new TPUs at Cloud Next 2026, splitting its eighth-generation design into a training chip and an inference chip for the first time in the program's decade-long history. TPU 8t claims 121 FP4 exaflops per 9,600-chip superpod and 2.8x better price-performance than Ironwood. TPU 8i pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM, triple the previous generation, and targets the KV-cache pain that is eating inference margins across the industry. Google Cloud CEO Thomas Kurian called the split "a natural evolution."
Read the spec sheet against Nvidia's latest, though, and the competitive story isn't there. Rubin lands at 50 PFLOPS NVFP4 inference, 35 PFLOPS training, 288 GB HBM4, and 22 TB/s memory bandwidth on a single GPU, per Nvidia's own disclosures. The Register's same-day comparison puts TPU 8t at 12.6 PFLOPS FP4 and 6.5 TB/s HBM bandwidth. TPU 8i at 10.1 PFLOPS and 8.6 TB/s. Per accelerator, it isn't close.
That is the wrong comparison. It's also the one Google is counting on you to stop making.
Key Takeaways
- Google's eighth-generation TPUs split training and inference onto separate chips, with TPU 8t claiming 121 exaflops per pod and TPU 8i promising 80% better inference cost.
- Per socket, Nvidia's Rubin still leads TPU 8 on FP4 compute and HBM bandwidth, and MLPerf has no TPU 8 entries to verify Google's claims.
- Anthropic's million-TPU commitment and Meta's reported multibillion-dollar rental deal shift the fight from benchmarks to inference economics.
- Google kept selling Nvidia Vera Rubin instances alongside TPU 8, signaling the goal is cloud margin capture, not Nvidia replacement.
AI-generated summary, reviewed by an editor. More on our AI guidelines.
The per-chip war ended in Santa Clara
Nvidia's per-accelerator roadmap is probably safe through the decade. Rubin carries more bandwidth. More FP4. More NVLink per GPU. Its GTC launch in March bolted that lead onto a seven-chip rack platform, a trillion-dollar order book, and a software moat so deep that CUDA has a stronger hiring pipeline than most mid-cap tech companies. Any honest read of the public record puts Nvidia ahead at the chip level, the rack level, and the benchmark level. MLPerf Training v5.1 has Blackwell entries. TPU 8 has none yet.
So Google isn't trying to win there. TPU 8t's pitch is that a 9,600-chip superpod and a 134,000-chip Virgo fabric matter more than peak FP4, if the job is training a frontier model that lives on Google Cloud and speaks JAX. Pathways. MaxText. The customers Google wants at that scale don't buy chips. They buy multi-year capacity contracts. Anthropic's commitment alone covers up to a million TPUs and roughly 3.5 gigawatts of 2027 capacity through Broadcom, per Broadcom's April SEC filing.
Look at the shape of that. Google sold a chip that has no public MLPerf result by locking in a gigawatt-scale customer first. That isn't a hardware strategy. That's a utility strategy, priced in electricity.
Per chip, Nvidia wins.
At scale, Google competes.
Google Cloud split its eighth-generation TPU at Cloud Next 2026. The single-socket numbers still favor Nvidia Rubin. The cluster, cloud and capacity numbers tell a different story.
Google kept selling Nvidia Vera Rubin instances on the same stage. A company trying to kill Nvidia does not make Nvidia's rack a first-class cloud product. The fight isn't benchmarks. It's margin on the incremental gigawatt.
Inference is where Nvidia bleeds
TPU 8i is the more dangerous chip, even though it looks less impressive on paper. Training is one-time. Serving is forever. Every reasoning step, every agent tool call, every retrieved document runs through inference silicon, and every one of those calls shows up on a bill. Google claims 80% better inference performance per dollar than Ironwood. Whether that holds for your model is a question only a pilot answers. But the architectural choices say Google knows what is actually hurting customers.
The 384 MB of on-chip SRAM is the tell. That is three times what Ironwood carried, and it exists for exactly one reason: keep the KV cache close. Reasoning models stack context. MoE models route tokens through experts scattered across the fabric. Long-context agents load retrieval traces into memory and never let go. All of it crushes HBM bandwidth and spikes tail latency, the p99 number that decides whether a product feels broken. Nvidia's Blackwell Ultra attacks the same problem with more HBM and stronger attention cores. Google is answering with more local memory, a new Boardfly topology that cuts network diameter by half, and a Collectives Acceleration Engine that Google says trims on-chip collective latency up to 5x. Different theories of the same bottleneck.
Here is what matters commercially. Enterprises do not care who wins MLPerf. They care about cost per million tokens at a fixed latency target. If TPU 8i serves Gemma, Llama, or Qwen through vLLM TPU and Vertex AI at a material discount to the equivalent Nvidia lane, that workload routes to TPU. One lane. Then another. Nvidia does not lose the customer. It loses the margin on the stable, high-volume, commodity-serving traffic. The exact traffic its pricing power was built on.
The AI chip economy is fragmenting
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
The ecosystem Google can't yet buy
The counterargument is real. Nvidia still owns the defaults. CUDA dominates every serious GPU cluster, and TensorRT-LLM plus NCCL have a decade of production scars the TPU stack cannot match. MLPerf submissions stretch across dozens of partner systems; TPU 8 has not yet cleared even one. Every AI team with more PyTorch engineers than XLA engineers is a Nvidia shop, and that is nearly every AI team in the industry. Google's TorchTPU preview and the vLLM TPU bridge are improving the migration math, but "few configuration changes" is not the same as zero. Inference stacks break on details: unsupported attention variants, speculative decoding compatibility, LoRA adapter handling, custom kernels, quantization drift. Nvidia has more miles on those details than anyone.
Google's second limit is that TPUs live in Google Cloud. That is the whole product. You cannot rack TPUs in your own data center. You cannot port TPU workloads to AWS the way you can port Nvidia. For customers who value multi-cloud optionality or on-prem control, that gap is a feature of Nvidia, not a bug.
Which is why Google kept selling Nvidia anyway. Cloud Next announced A5X bare-metal instances powered by Vera Rubin NVL72 alongside the TPU news. Virgo connects both fabrics. The two companies are co-engineering Falcon networking. Thinking Machines Lab, Mira Murati's startup at a $12 billion valuation, signed a multibillion-dollar Google Cloud deal the same day, and what Thinking Machines bought was GB300. Not TPU.
A company trying to kill Nvidia does not make Nvidia's rack a first-class cloud product. Google is trying to capture more of the AI compute margin in its own cloud, whichever chip the customer picks.
What this actually does to Nvidia
The TPU 8 launch does not hurt Nvidia through benchmarks. It hurts Nvidia through optionality. The market's willingness to queue six months for Blackwell capacity was Nvidia's pricing lever. With Anthropic locked into gigawatts of TPU, Meta reportedly renting TPU through a multibillion-dollar deal, and MediaTek designing the v8i inference chip after its Ironwood peripheral work reportedly came in 20 to 30 percent under alternatives, that queue gets shorter. A shorter queue erodes pricing power fast. Not on the installed base, but on every new gigawatt that comes up for negotiation in 2026 and 2027.
Nvidia is a nearly $5 trillion company because it sells scarcity as much as silicon. TPU 8 will not erase that. It will make scarcity contingent on workload. Nvidia still wins for frontier training outside Google Cloud, for heterogeneous PyTorch research, for every workflow built on custom CUDA kernels, and for on-prem deployment where cloud optionality is non-negotiable. For cost-sensitive, high-volume, stable-model inference inside Google Cloud, the answer is newly in play.
That is the fragmentation everyone has been predicting and few wanted to underwrite. It arrives quietly. One inference lane at a time. You will not see it in the 10-K until 2027, and by then the decision will already be done.
Frequently Asked Questions
How are TPU 8t and TPU 8i different?
TPU 8t is the training chip, built around a 9,600-chip superpod with 121 FP4 exaflops and 2 petabytes of shared memory. TPU 8i is the inference chip, pairing 288 GB of HBM with 384 MB of on-chip SRAM, triple the previous generation, to keep agent working sets close to the processor. Google also introduced a new Boardfly network topology for TPU 8i that cuts maximum diameter by more than half.
Does TPU 8 outperform Nvidia Rubin per accelerator?
No. Nvidia's Rubin GPU delivers 50 PFLOPS NVFP4 inference, 35 PFLOPS training, 288 GB HBM4, and 22 TB/s memory bandwidth per socket. The Register's same-day comparison reports TPU 8t at 12.6 PFLOPS FP4 and 6.5 TB/s HBM bandwidth, and TPU 8i at 10.1 PFLOPS and 8.6 TB/s. Google's advantage shows up in pod-scale integration and cloud economics, not raw single-chip specs.
Why is Google still selling Nvidia GPUs?
Cloud Next announced A5X bare-metal instances powered by Vera Rubin NVL72 alongside the TPU news, and Google is co-engineering Falcon networking with Nvidia. Thinking Machines Lab signed a multibillion-dollar Google Cloud deal the same day and bought GB300, not TPU. Google's goal is to capture compute margin inside Google Cloud regardless of which accelerator the customer picks.
Who is committing to TPU 8 capacity?
Anthropic committed to up to one million TPUs and roughly 3.5 gigawatts of next-generation TPU capacity starting 2027, disclosed in Broadcom's April SEC filing. Meta reportedly signed a multibillion-dollar TPU rental deal through Google Cloud. Citadel Securities uses TPUs for trading infrastructure, and Abu Dhabi's G42 has held multiple discussions with Google about TPU access.
Does TPU 8 threaten Nvidia's business?
Not in the installed base. Nvidia retains per-accelerator leadership, broader software support, and a public MLPerf evidence trail. The pressure lands on pricing power. Shorter GPU queues and a more credible TPU alternative erode Nvidia's ability to charge scarcity premiums on incremental gigawatt deals. Margin compression arrives first, benchmark displacement later, and not equally across workload lanes.
AI-generated summary, reviewed by an editor. More on our AI guidelines.



IMPLICATOR