Google unveiled its eighth-generation tensor processing unit in two distinct variants Wednesday at Cloud Next 2026, splitting training and inference work onto separate silicon for the first time in the TPU program's decade-long history. The training-focused TPU 8t delivers 2.8x better price/performance than last year's Ironwood chip at 121 exaflops per pod, while the inference-tuned TPU 8i claims 80% better performance per dollar and pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM to keep agent working sets on the chip itself. Google Cloud CEO Thomas Kurian called the split "a natural evolution," positioning the two-chip architecture against both Nvidia's unified GPU approach and AWS's Trainium and Inferentia lineup as enterprises shift from model experiments to running persistent AI agents around the clock.

Key Takeaways

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Two chips, two workloads

A TPU 8t superpod scales to 9,600 liquid-cooled chips knit together by 2 petabytes of shared high-bandwidth memory, doubling interchip bandwidth over Ironwood. Amin Vahdat, SVP and chief technologist for AI infrastructure at Google, told reporters the split architecture had been in development for two years, before agents went mainstream, based on conversations with DeepMind about where compute would bottleneck next. Training workloads still chase throughput. Agents chase latency.

That is why the TPU 8i uses a new network topology Google calls Boardfly. It replaces the 3D torus layout that training pods use and trims on-chip latency by up to 5x through a new Collectives Acceleration Engine. Google doubled the physical CPU hosts per server to its Axion Arm processors and increased interconnect bandwidth to 19.2 Tb/s for mixture-of-expert models. The result, per Google's own numbers: customers can serve nearly twice the user volume at the same cost.

The chip supply chain widens

Broadcom reportedly designs the TPU 8t training silicon, codenamed Sunfish in media reports. MediaTek is reported to handle the TPU 8i inference chip, codenamed Zebrafish, with its I/O and peripheral designs on the prior Ironwood running 20 to 30% cheaper than alternatives. Marvell is in talks with Google on a memory processing unit and a second inference TPU. Intel signed on April 9 to supply Xeon CPUs and custom infrastructure processing units for the surrounding data-center layer. TSMC fabricates all of it, reportedly targeting 2nm for late 2027.

MediaTek's stock hit its daily limit on the v8i news, closing at a record TWD 2,090 with market cap above TWD 3.3 trillion. Anthropic had already lined up the demand. Earlier this month, it signed a separate Broadcom-Google agreement for up to a million TPUs. The commitment covers roughly 3.5 gigawatts of capacity starting 2027.

Workspace Intelligence and the control plane

The chips are half the keynote. The other half is Workspace Intelligence. Think of it as a context layer that sits beneath the whole Workspace suite, from the inbox you scroll at breakfast to the spreadsheet tab you opened this morning. Google says the layer learns your voice. It learns which templates your company uses. It pitches decks and emails that sound authentically like you, or at least that is the sales pitch. Ask Gemini inside Chat can now generate a full slide deck in one prompt, draft invoice reviews by matching new bills against your inbox, or surface meeting times that fit everyone's calendar. A new Workspace MCP server lets third-party apps tap the same plumbing.

The cloud division's Q4 revenue jumped 48% to $17.7 billion, per company figures. The backlog is louder. It reached $240 billion by year-end 2025, roughly double where it stood twelve months earlier. Google earmarked a fresh $750 million for partners to sell Gemini-powered agents into enterprise accounts. The company wants the orchestration and governance layer where agents live. The model underneath becomes a commodity.

Nvidia still present

The Nvidia story at Cloud Next is stranger than it looks. Mira Murati's Thinking Machines Lab signed a multi-billion-dollar deal with Google Cloud on Wednesday, reportedly in the single-digit billions, for access to systems built on Nvidia's new GB300 chips. Google is selling Nvidia hardware inside its own cloud while building purpose-built silicon to compete with it. Reported forecasts put Google's TPU shipments at 4.3 million this year and more than 35 million by 2028.

Kurian's framing gets at the shift. "People want systems that were more optimized for training, and separately, systems that were more optimized for inference." Implicator.ai covered Ironwood last April, when Google first positioned a TPU specifically for inference. Twelve months later, the line has forked.

The training race is about who builds the biggest model. The inference race is about who pays the lowest cost per query at scale. Google just said the quiet part out loud. Those are two different chips. Different partners build them, different fabrics connect them, different customers buy them. Nvidia still ships in the same racks.

Frequently Asked Questions

What is the difference between Google's TPU 8t and TPU 8i chips?

TPU 8t is built for training frontier AI models. It scales to 9,600 chips per pod and delivers 121 exaflops of compute per pod. TPU 8i is built for inference, meaning running trained models in production. It claims 80% better performance per dollar than Ironwood and pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM to keep agent working sets on-chip.

When will Google's TPU 8t and 8i chips be generally available?

Google said the chips are expected to be generally available later this year. The underlying design partnerships, including Broadcom for training silicon and MediaTek for inference silicon, reportedly target TSMC's 2nm process node for late 2027 for the full chip generation rollout.

How does Google's split TPU strategy compete with Nvidia?

Nvidia ships a unified GPU line for both training and inference. Google is splitting the workload onto purpose-built silicon to attack inference cost specifically. At Google's query scale, cost per inference determines AI unit economics. Google still sells Nvidia GB300 systems inside Google Cloud, so the competition is overlapping rather than purely head-to-head.

What is Workspace Intelligence?

Workspace Intelligence is a new background context layer Google announced at Cloud Next 2026. It sits beneath the Workspace suite and learns a user's voice, formatting preferences, and company templates. It powers features such as AI Inbox in Gmail, Ask Gemini in Chat, one-prompt slide deck generation, and a new Workspace MCP server for third-party apps.

Who are Google's chip design partners for TPU 8?

Broadcom reportedly designs the TPU 8t training chip, codenamed Sunfish in media reports. MediaTek is reported to handle the TPU 8i inference chip, codenamed Zebrafish. Marvell is in talks on a memory processing unit and a second inference TPU. Intel supplies Xeon CPUs and custom infrastructure processing units. TSMC fabricates all of it.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Nvidia Didn't Just Launch Chips at GTC. It Launched a Lock-In Machine.
Monday at the SAP Center in San Jose, Jensen Huang held up a chip. Rotated it under the stage lights, slow, deliberate, the way he always does. A jeweler showing off a diamond. Thirty thousand people
Nvidia Collects. Google Crawls Back.
Las Vegas | January 6, 2025 Four AI CEOs stood on stage at CES to praise their chip supplier. Not their product. Their supplier. OpenAI, Anthropic, Meta, xAI, all genuflecting before Jensen Huang's R
Google Unveils Ironwood AI Chip to Power Next-Gen AI Apps
Each Ironwood chip packs 192GB of memory and can work in massive clusters of up to 9,216 chips. Google says it doubles the performance-per-watt compared to its previous chip, Trillium. The chip speci
AI News

New Delhi

Freelance correspondent reporting on the India-U.S.-Europe AI corridor and how AI models, capital, and policy decisions move across borders. Covers enterprise adoption, supply chains, and AI infrastructure deployment. Based in New Delhi.