LLM Meter — Week of May 24

---GEMINI---
score: 90
trend: up
change: +2
+ Gemini 3.5 Flash scores 55 on Artificial Analysis, within 2 points of Claude Opus 4.7 and 5 of GPT-5.5 at roughly a third of the per-token cost
+ Named enterprise adopters at launch (Salesforce Agentforce, Databricks, Ramp, Xero) turn I/O into procurement-grade distribution
+ I/O cleared last week's post-launch downside risk; the feared "lands behind GPT-5.5" framing did not materialize
+ Antigravity agent-platform upgrade plus Google Spark extend the agentic stack enterprise buyers can standardize on
- Flagship Gemini 3.5 Pro slips to next month and the DeepMind Pentagon-work letter stays unresolved
---CLAUDE---
score: 86
trend: down
change: -1
+ Claude Opus 4.7 remains the reference flagship rivals benchmark against, holding the top of the quality field
- Gemini 3.5 Flash now lands within two points of Opus 4.7 at a third of the price, pressuring Claude's premium positioning
- June 15 Agent SDK metering plus unpublished plan-limit denominators keep the procurement-trust drag live
- No new enterprise win this week after the PwC/SAP/Gates run, ceding the cycle to Google and OpenAI
- Pentagon IL6/IL7 classified-network exclusion still unresolved
---CHATGPT---
score: 85
trend: up
change: +1
+ OpenAI-Dell deal brings Codex to hybrid and on-prem environments via the Dell AI Data Platform, a real regulated-buyer unlock
+ Gartner named OpenAI a Leader in enterprise coding agents, a procurement-cycle credential
+ New $234M Singapore Applied AI Lab, OpenAI's first outside the US, deepens sovereign-enterprise reach
+ GPT-5.5 now fully rolled out across Plus, Business, and Enterprise tiers
- Roughly $14B in projected 2026 losses and the $600B compute commitment keep the financial overhang in place
---MISTRAL---
score: 72
trend: down
change: -1
+ Acquired Vienna's Emmi AI to add physics-aware industrial modeling to the enterprise stack
+ EU-sovereign procurement option intact via the Paris/Sweden buildout and AI Act compliance dossier
- A quiet week with only a tuck-in acquisition while every top-tier rival shipped a major enterprise signal
- Benchmark standing trails Opus 4.7, GPT-5.5, and now Gemini 3.5 Flash on price/performance
- Remains outside the Pentagon classified-network vendor roster
---GROK---
score: 31
trend: up
change: +1
+ Heavy product cadence: Grok Skills (May 18), third-party connectors (May 22), and the Grok Build coding agent landed in quick succession
+ Grok V9 Medium completed training (May 25), reportedly 1.5T parameters, signaling roadmap momentum
- SpaceX's IPO filing disclosed xAI burned $6.4B last year, with another round of layoffs and a Grok-team restructure
- New features are developer/consumer-facing; no federal, compliance, or enterprise-procurement progress
- "Chatbot no one uses" narrative persisted into this week, capping any adoption credit
---DEEPSEEK---
score: 16
trend: up
change: +1
+ Permanently cut V4-Pro API price 75% to ~$0.44 per million input tokens, under a tenth of GPT-5.5, reinforcing cost leadership
+ First external funding round ($45-50B, Tencent/Alibaba/Big Fund) continues, easing the undercapitalization argument
- V4-Pro ranks 9th globally at 63.87% accuracy; the quality gap to Western flagships persists
- The aggressive cut also reads as margin pressure as open-weight rivals close the gap
- US government-device bans and the broader compliance perimeter remain fully in force

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

LLM Meter — Week of May 24

Marcus Schuler

Get the Morning Briefing in your inbox.