Google says a prompt uses 0.24 Wh. Researchers say the math is incomplete.

💡 TL;DR - The 30 Seconds Version

📊 Google disclosed the first granular AI energy data from a major company: each Gemini text prompt uses 0.24 watt-hours, equivalent to nine seconds of TV watching.

🔧 AI chips account for just 58% of energy use, with supporting infrastructure (CPUs, idle machines, cooling) claiming the remaining 42% in real production environments.

⚡ Energy consumption per Gemini prompt dropped 33-fold over 12 months while response quality improved, driven by efficiency innovations under resource constraints.

🚫 Critics challenge the methodology for excluding indirect water use and manufacturing impacts, calling Google's approach misleading about AI's true environmental footprint.

📏 The disclosure sets a precedent for industry measurement standards, potentially forcing other AI companies to report comparable environmental data.

🏭 Rapid efficiency gains challenge utility projections of massive power demand increases, potentially reshaping billions in infrastructure investment decisions.

Google has finally put a number on AI’s everyday footprint, publishing figures that peg a median Gemini text prompt at 0.24 watt-hours of electricity, 0.03 grams of CO₂, and 0.26 milliliters of water—roughly nine seconds of TV and five drops of water—using its new per-prompt energy methodology. It’s the clearest disclosure yet from a major AI provider. It also starts a new fight over what, exactly, should count.

What Google counted—and what it didn’t

The company framed the study as “full-stack” accounting for real-world inference. Chips aren’t the whole story. In Google’s breakdown, TPUs contribute 58% of the 0.24 Wh; host CPUs and memory add 25%; idle capacity kept ready for failover contributes 10%; and data-center overhead, including cooling and power conversion, provides the final 8%. That mix reflects production utilization, not lab peaks. It’s a useful correction to models that assume accelerators run flat-out all day. Real systems don’t.

But scope matters. The analysis covers text prompts only. It excludes multimodal tasks and indirect water use in supply chains. And it reports a median prompt, not the long tail of heavyweight queries. That omission is where critics focus. One sentence sums it up: what you don’t measure, you can’t manage.

The headline number—put in context

Per-prompt metrics help users and engineers reason about efficiency. They don’t reveal aggregate impact. Google hasn’t disclosed total Gemini volumes, which would allow outside estimates of daily or annual energy and water demand. That opacity limits policy relevance. Totals drive planning.

Comparisons are tricky, too. OpenAI’s Sam Altman has said an average ChatGPT prompt consumes ~0.34 Wh, higher than Google’s median for Gemini. Both figures apply to text, not image or video generation, which typically cost more. Both also depend on emissions factors. Google uses a market-based metric that reflects its clean-power purchases, which lowers the reported carbon per kWh relative to grid averages. That’s defensible within greenhouse-gas accounting rules. It also makes cross-company comparisons noisy.

Efficiency as a competitive strategy

The most striking claim isn’t the 0.24 Wh. It’s the slope. Google says energy per median Gemini text prompt fell 33× in 12 months, while the per-prompt carbon footprint fell 44×. Those are order-of-magnitude shifts, not incremental trims. Constraints forced it.

How? Three levers stand out. First, architecture: mixture-of-experts routes queries to a tiny slice of a large model, and newer “hybrid reasoning” regimes reduce compute-intensive steps. Second, serving: speculative decoding lets a small model propose tokens that a larger model verifies, cutting sequential work; distillation yields smaller, faster variants (e.g., Flash, Flash-Lite) for high-throughput traffic. Third, hardware-software co-design: the latest TPU generation (Ironwood) targets much higher performance per watt, and Google’s compiler/runtime stack squeezes more useful work from each joule. The strategy shift is explicit: performance per watt over performance per dollar.

The cautions from researchers

Academics welcomed the granularity—especially the inclusion of idle capacity and overhead long ignored in back-of-the-envelope estimates. It’s real-world accounting. Yet several raised red flags about the message it sends. Per-prompt numbers can normalize a growing footprint if total query volumes explode. The omission of indirect water and manufacturing impacts understates the full life-cycle costs of AI. And a company-run methodology, however detailed, still lacks independent verification. One short takeaway: transparency isn’t the same as auditability.

Why utilities and chipmakers care

If 33× efficiency gains persist, the scarier demand curves in utility planning memos may soften. Many forward projections assume constant utilization near nameplate power and linear scaling of today’s models into tomorrow’s workloads. Those assumptions rarely survive contact with engineering reality—and financial pressure. The more power becomes a hard constraint, the faster model, compiler, and scheduler teams chase watts. That said, “efficiency miracles” don’t eliminate scale effects. Large-language-model deployment is expanding across products, enterprises, and regions. Even steep per-prompt declines can be swamped by growth in prompts. Both things can be true.

What standardization should look like next

Google’s paper will be a reference point because it captures production conditions others gloss over. It shouldn’t be the last word. The field needs an auditable, apples-to-apples score: test suites that specify model class and version, prompting regime, context length, batch size, target latency, hardware generation, data-center PUE/WUE, and emissions factors—and that report medians and tails. Think Energy Star, but for AI inference. Until then, every company’s number is, at best, a well-argued estimate bound to its own stack.

The bottom line

Google has moved the conversation from guesswork to measurement. That’s progress. But the gap between per-prompt optics and system-level impact remains wide—and that’s where policy, investment, and public trust will be decided.

Why this matters

Method rules the scoreboard: A credible, shared methodology could force Big Tech to report comparable figures and allow regulators, researchers, and customers to weigh trade-offs honestly.
Planning real infrastructure: Utilities and chipmakers need defensible numbers; separating hype from load will determine where billions in grid, cooling, and fab capacity actually go.

❓ Frequently Asked Questions

Q: How does Gemini's energy use compare to ChatGPT and everyday activities?

A: Google's 0.24 watt-hours per prompt beats OpenAI's 0.34 wh for ChatGPT by about 30%. For context, that's equivalent to watching TV for 9 seconds, running a microwave for 1 second, or using a 60-watt lightbulb for 14 seconds.

Q: What is the "indirect water use" that critics say Google excluded?

A: Google measured only direct cooling water at data centers (5 drops per prompt). Critics want broader accounting: water used to manufacture chips, generate electricity, and extract materials. This supply chain water can be 10-100x larger than direct use.

Q: How did Google achieve a 33x efficiency improvement in just 12 months?

A: Three main innovations: mixture-of-experts models that activate only relevant parts (10-100x reduction in computation), speculative decoding using smaller models to verify larger ones, and custom TPU chips that are 30x more energy-efficient than Google's first generation.

Q: Will AI get cheaper to run as efficiency improves?

A: Per-query costs should drop significantly if 33x efficiency gains continue. However, total energy demand could still explode if query volumes grow faster than efficiency improves. Think: 33x more efficient but 100x more queries equals higher total consumption.

Q: Could this disclosure lead to AI energy regulations?

A: Currently no regulations require AI energy disclosure. Google's methodology could become an industry standard, similar to Energy Star ratings for appliances. Regulators need baseline data before setting rules—this provides the first credible benchmark from a major provider.