Google DeepMind released Gemma 4 on Thursday, a family of four open models built from the same research that powers its proprietary Gemini 3, the company announced. The bigger news sits in the fine print. For the first time, Google is shipping its open models under a standard Apache 2.0 license, abandoning the restrictive custom terms that kept enterprise legal teams nervous for two years.
The shift matters because Google's previous Gemma license included a prohibited-use policy the company could update unilaterally. It required developers to enforce Google's rules across all derivative projects and, according to Ars Technica, could even be read to transfer licensing obligations to AI models trained on Gemma-generated synthetic data. Apache 2.0 eliminates all of that. No custom clauses, no redistribution restrictions, no commercial carve-outs.
Key Takeaways
- Google released Gemma 4 under Apache 2.0, abandoning the restrictive custom license that kept enterprises away for two years
- The 31B dense model ranks third among open models globally, while the 26B MoE activates only 3.8B of 25.2B parameters during inference
- Independent testing shows Alibaba's Qwen 3.5 narrowly beats Gemma 4 on key benchmarks, but Google's models consume less compute per token
- Google confirmed Gemini Nano 4 for Pixel phones will be based directly on Gemma 4's edge models
AI-generated summary, reviewed by an editor. More on our AI guidelines.
Four models, two deployment tiers
Gemma 4 ships in four sizes organized around where they run. The 31-billion-parameter dense model and its 26-billion-parameter MoE sibling target workstations, handling text and images across 256,000-token context windows. Then the small ones. E2B and E4B were built for phones, Raspberry Pi boards, Jetson Orin Nano modules. They get 128,000-token context and native audio, so speech recognition runs entirely on the handset.
The naming convention needs unpacking. VentureBeat reports that the "E" prefix denotes "effective parameters." E2B activates 2.3 billion effective parameters during inference but carries 5.1 billion total, because each decoder layer includes its own embedding table through a technique called Per-Layer Embeddings. The architecture is large on disk but cheap to compute.
Inside the 26B MoE model, 128 small experts sit waiting. Eight activate per token, plus one shared always-on expert. Only 3.8 billion of the model's 25.2 billion total parameters fire during inference. That translates to 26B-class intelligence at roughly 4B-class compute costs, according to Google.
Benchmarks tell a mixed story
Google positioned the 31B dense model at third place among open models on the Arena AI text leaderboard. But the rankings shift depending on which benchmark you read.
On AIME 2026, the American Invitational Mathematics Examination adapted for AI evaluation, the 31B model scored 89.2% and hit a Codeforces ELO of 2,150. Those numbers would have been frontier-class from closed models not long ago. Gemma 3 27B, by comparison, managed just 20.8% on AIME without thinking mode enabled. The generational leap is real.
Independent testing from Artificial Analysis adds context you won't find in Google's press materials. On GPQA Diamond, a graduate-level science reasoning benchmark, Gemma 4 31B scored 85.7% in reasoning mode. That's second-best among all open models under 40 billion parameters. Alibaba's Qwen 3.5 27B scored 85.8%. The gap is a rounding error, but it means Google's flagship open model doesn't hold the crown. Qwen 3.5 also beats Gemma 4 on MMLU-Pro (86.1% vs. 85.2%) and Humanity's Last Exam, where Chinese models dominate.
Where Gemma 4 claws back ground is efficiency. At around 1.2 million output tokens, the 31B model reportedly consumes less compute than Qwen 3.5 27B (1.5 million) or Qwen 3.5 35B (1.6 million). For teams paying per GPU-hour, that spread adds up.
The license is the real product
For two years, enterprises evaluating open-weight models faced an awkward choice. Google's Gemma line performed well, but legal departments flagged the custom license. Compliance teams hesitated. And many organizations chose Mistral or Alibaba's Qwen instead, because those shipped under terms lawyers already understood.
Get Implicator.ai in your inbox
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
Hugging Face CEO Clement Delangue did not hedge. "A huge milestone," he called the Apache 2.0 switch. Demis Hassabis went further, branding Gemma 4 "the best open models in the world for their respective sizes." That's the Google DeepMind CEO selling hard, but the licensing shift gives the claim teeth.
The timing carries its own signal. Chinese AI labs have started pulling back from fully open releases, VentureBeat reports. Alibaba's latest Qwen models, including Qwen 3.5 Omni and Qwen 3.6 Plus, arrived with tighter restrictions. Google is moving in the opposite direction, opening up while competitors close down.
What actually ships on your phone
Google confirmed to Ars Technica that Gemini Nano 4, the on-device model running on Pixel phones, will be based directly on Gemma 4's E2B and E4B. That's the first official confirmation of the next Nano generation. Developers can prototype agentic workflows in the AICore Developer Preview now, with forward compatibility promised when Nano 4 launches later this year.
The edge models punch above expectations. E4B hit 42.5% on AIME 2026 and 52.0% on LiveCodeBench, strong numbers for a model that runs on a T4 GPU. Google's audio encoder shrank from 681 million parameters to 305 million, and frame duration dropped from 160 milliseconds to 40, making real-time transcription viable on a handset.
Function calling is baked in at the architecture level across all four models, drawing on research from Google's FunctionGemma release late last year. Earlier open models faked it. Developers would write elaborate prompts hoping the model cooperated with structured tool calls. Hit or miss. Gemma 4 trains the capability from scratch.
The ecosystem play
Developers have pulled down Google's open models four hundred million times since February 2024, spinning off more than 100,000 community variants along the way. That installed base existed before this release. Some of the stranger spinoffs hint at the range: MedGemma reads medical imaging, DolphinGemma parses dolphin vocalizations, and SignGemma handles sign language translation. Not the obvious use cases, which is the point.
NVIDIA announced day-one optimization for Gemma 4 across its GPU lineup, from Jetson Orin Nano to Blackwell. The models work with Ollama, llama.cpp, MLX, LM Studio, vLLM, and Unsloth out of the box. Google Cloud offers serverless deployment through Cloud Run with RTX Pro 6000 GPUs that scale to zero when idle.
Holger Mueller at Constellation Research framed it as land-and-expand: "Google is building its lead in AI, not only by pushing Gemini, but also open models with the Gemma 4 family. These are important for building an ecosystem of AI developers."
That ecosystem argument is the one worth watching. For anyone evaluating open models in production, the licensing question on Google's side just disappeared. What decides the race between Gemma 4, Qwen 3.5, and Meta's Llama won't be benchmark margins. It'll be tooling depth, fine-tuning community, and which model family gets you from prototype to production with the fewest surprises.
Frequently Asked Questions
What is Gemma 4?
Google DeepMind's latest family of four open AI models (E2B, E4B, 26B MoE, 31B Dense) built from the same research as Gemini 3. All four ship under the Apache 2.0 license, a first for Google's open model line.
Why does the Apache 2.0 license matter?
Previous Gemma versions used a custom Google license with restrictions that enterprise legal teams found problematic. Apache 2.0 allows unrestricted commercial use, modification, and redistribution with no custom clauses.
How does Gemma 4 compare to Chinese open models like Qwen 3.5?
Alibaba's Qwen 3.5 27B narrowly beats Gemma 4 31B on benchmarks like GPQA Diamond (85.8% vs. 85.7%) and MMLU-Pro. But Gemma 4 consumes fewer output tokens, making it cheaper to run per GPU-hour.
Can Gemma 4 run on a phone?
Yes. The E2B and E4B edge models run on smartphones, Raspberry Pi, and Jetson Orin Nano with native audio for speech recognition. Google confirmed these models form the basis for the upcoming Gemini Nano 4.
Where can developers download Gemma 4?
Model weights are available on Hugging Face, Kaggle, and Ollama. Day-one framework support includes vLLM, llama.cpp, MLX, LM Studio, Ollama, and Unsloth. Google AI Studio hosts the 31B and 26B models for testing.
AI-generated summary, reviewed by an editor. More on our AI guidelines.



Implicator