Google splits TPU 8t and TPU 8i to chase Nvidia on inference

Google unveiled TPU 8t and TPU 8i Wednesday at Cloud Next 2026, splitting training and inference work onto separate silicon for the first time in the TPU program's decade-long history. The training-focused TPU 8t delivers 2.8x better price/performance than last year's Ironwood chip at 121 exaflops per pod, while the inference-tuned TPU 8i claims 80% better performance per dollar and pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM to keep agent working sets on the chip itself. Google Cloud CEO Thomas Kurian called the split "a natural evolution," positioning the two-chip architecture against both Nvidia's unified GPU approach and AWS's Trainium and Inferentia lineup as enterprises shift from model experiments to running persistent AI agents around the clock.

Key Takeaways

Google unveiled TPU 8t for training and TPU 8i for inference at Cloud Next 2026.
TPU 8t scales to 9,600 chips per pod at 121 exaflops; TPU 8i claims 80% better performance per dollar than Ironwood.
Broadcom reportedly designs the training chip, MediaTek the inference chip; Intel, Marvell, and TSMC round out the supply chain.
Workspace Intelligence debuted as a context layer across the Workspace suite, powering new Gemini agent features.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Two chips, two workloads

A TPU 8t superpod scales to 9,600 liquid-cooled chips knit together by 2 petabytes of shared high-bandwidth memory, doubling interchip bandwidth over Ironwood. Amin Vahdat, SVP and chief technologist for AI infrastructure at Google, told reporters the split architecture had been in development for two years, before agents went mainstream, based on conversations with DeepMind about where compute would bottleneck next. Training workloads still chase throughput. Agents chase latency.

That is why the TPU 8i uses a new network topology Google calls Boardfly. It replaces the 3D torus layout that training pods use and trims on-chip latency by up to 5x through a new Collectives Acceleration Engine. Google doubled the physical CPU hosts per server to its Axion Arm processors and increased interconnect bandwidth to 19.2 Tb/s for mixture-of-expert models. The result, per Google's own numbers: customers can serve nearly twice the user volume at the same cost.

The chip supply chain widens

Broadcom reportedly designs the TPU 8t training silicon. MediaTek is reported to handle the TPU 8i inference chip, with its I/O and peripheral designs on the prior Ironwood running 20 to 30% cheaper than alternatives. Marvell is in talks with Google on a memory processing unit and a second inference TPU. Intel signed on April 9 to supply Xeon CPUs and custom infrastructure processing units for the surrounding data-center layer. TSMC fabricates all of it, reportedly targeting 2nm for late 2027.

The AI infrastructure shift is happening now

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

MediaTek's stock hit its daily limit on the TPU 8i news, closing at a record TWD 2,090 with market cap above TWD 3.3 trillion. Anthropic had already lined up the demand. Earlier this month, it signed a separate Broadcom-Google agreement for up to a million TPUs. The commitment covers roughly 3.5 gigawatts of capacity starting 2027.

Workspace Intelligence and the control plane

The chips are half the keynote. The other half is Workspace Intelligence. Think of it as a context layer that sits beneath the whole Workspace suite, from the inbox you scroll at breakfast to the spreadsheet tab you opened this morning. Google says the layer learns your voice. It learns which templates your company uses. It pitches decks and emails that sound authentically like you, or at least that is the sales pitch. Ask Gemini inside Chat can now generate a full slide deck in one prompt, draft invoice reviews by matching new bills against your inbox, or surface meeting times that fit everyone's calendar. A new Workspace MCP server lets third-party apps tap the same plumbing.

The cloud division's Q4 revenue jumped 48% to $17.7 billion, per company figures. The backlog is louder. It reached $240 billion by year-end 2025, roughly double where it stood twelve months earlier. Google earmarked a fresh $750 million for partners to sell Gemini-powered agents into enterprise accounts. The company wants the orchestration and governance layer where agents live. The model underneath becomes a commodity.

Nvidia still present

The Nvidia story at Cloud Next is stranger than it looks. Mira Murati's Thinking Machines Lab signed a multi-billion-dollar deal with Google Cloud on Wednesday, reportedly in the single-digit billions, for access to systems built on Nvidia's new GB300 chips. Google is selling Nvidia hardware inside its own cloud while building purpose-built silicon to compete with it. Reported forecasts put Google's TPU shipments at 4.3 million this year and more than 35 million by 2028.

Kurian's framing gets at the shift. "People want systems that were more optimized for training, and separately, systems that were more optimized for inference." Implicator.ai covered Ironwood last April, when Google first positioned a TPU specifically for inference. Twelve months later, the line has forked.

The training race is about who builds the biggest model. The inference race is about who pays the lowest cost per query at scale. Google just said the quiet part out loud. Those are two different chips. Different partners build them, different fabrics connect them, different customers buy them. Nvidia still ships in the same racks.

Frequently Asked Questions

What is the difference between Google's TPU 8t and TPU 8i chips?

TPU 8t is built for training frontier AI models. It scales to 9,600 chips per pod and delivers 121 exaflops of compute per pod. TPU 8i is built for inference, meaning running trained models in production. It claims 80% better performance per dollar than Ironwood and pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM to keep agent working sets on-chip.

When will Google's TPU 8t and TPU 8i chips be generally available?

Google said the chips are expected to be generally available later this year. The underlying design partnerships, including Broadcom for training silicon and MediaTek for inference silicon, reportedly target TSMC's 2nm process node for late 2027 for the full chip generation rollout.

How does Google's split TPU strategy compete with Nvidia?

Nvidia ships a unified GPU line for both training and inference. Google is splitting the workload onto purpose-built silicon to attack inference cost specifically. At Google's query scale, cost per inference determines AI unit economics. Google still sells Nvidia GB300 systems inside Google Cloud, so the competition is overlapping rather than purely head-to-head.

What is Workspace Intelligence?

Workspace Intelligence is a new background context layer Google announced at Cloud Next 2026. It sits beneath the Workspace suite and learns a user's voice, formatting preferences, and company templates. It powers features such as AI Inbox in Gmail, Ask Gemini in Chat, one-prompt slide deck generation, and a new Workspace MCP server for third-party apps.

Who are Google's chip design partners for TPU 8t and TPU 8i?

Broadcom reportedly designs the TPU 8t training chip. MediaTek is reported to handle the TPU 8i inference chip. Marvell is in talks on a memory processing unit and a second inference TPU. Intel supplies Xeon CPUs and custom infrastructure processing units. TSMC fabricates all of it.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai