LLM Meter — Week of May 31

—CLAUDE---
score: 91
trend: up
change: +5
+ Opus 4.8 (May 28) hits 69.2% on SWE-bench Pro, about ten points ahead of GPT-5.5, retaking the coding lead enterprises benchmark against
+ $65B Series H at a $965B valuation passes OpenAI (~$852B) to make Anthropic the most valuable AI lab, with run-rate revenue tripling to ~$47B
+ Opus 4.8 shipped day-one on Amazon Bedrock, Google Vertex AI, and Microsoft Foundry, the three clouds regulated buyers already procure through
+ Claude Code "dynamic workflows" (hundreds of parallel subagents for codebase migrations) lands on Enterprise, Team, and Max plans
- June 15 Agent SDK metering, opaque plan-limit denominators, and a brief May 28 billing blip keep the procurement-trust drag alive
---GEMINI---
score: 88
trend: down
change: -2
+ The former Vertex AI consolidates into the Gemini Enterprise Agent Platform, unifying training, deployment, and agent orchestration in one stack
+ I/O adopters (Salesforce Agentforce, Databricks, Ramp, Xero) plus Spark and Deep Research Max keep Google's distribution procurement-grade
- Flagship Gemini 3.5 Pro slips to June, leaving Flash to carry the lineup the week Anthropic ships a new top-of-field Opus
- A quiet post-I/O week cedes both the coding lead and the most-valuable-lab title to Anthropic, dropping Gemini out of first
- DeepMind's Pentagon-work letter and classified-network questions remain unresolved
---CHATGPT---
score: 84
trend: down
change: -1
+ ChatGPT Enterprise added Skills governance, tighter upload scanning, and expanded Compliance Logs Platform support
+ Codex gains Windows Computer Use plus a GitHub Enterprise Server template for customer-hosted repos, extending into on-prem workflows
- GPT-5.5 now trails Opus 4.8 by about ten points on SWE-bench Pro, ceding the enterprise coding-agent quality lead
- Anthropic's $965B raise eclipses OpenAI's ~$852B valuation, ending its run as the most valuable AI lab
- About $14B in projected 2026 losses against the $600B compute commitment keeps the financial overhang in place
---MISTRAL---
score: 73
trend: up
change: +1
+ Airbus partnership spans commercial aircraft, helicopters, defense, and space, a marquee European industrial-and-defense reference account
+ New "Vibe" platform (rebranded Le Chat) pushes Mistral from model vendor into enterprise workflow software for email, documents, and code
+ Mistral for Industrial Engineering folds Emmi AI physics simulation into a stack for aerospace, automotive, and semiconductor buyers
+ €4B France/Sweden data-center buildout and a stated plan to explore custom chips reinforce the EU-sovereign supply story
- Benchmarks still trail Opus 4.8, GPT-5.5, and Gemini 3.5 Flash, and Mistral remains outside the Pentagon classified-network roster
---GROK---
score: 30
trend: down
change: -1
+ Grok V9-Medium (1.5T parameters, trained on Cursor developer data) finished training May 25, with a coding-focused release targeted for mid-June
+ Quality Mode for the Grok Imagine API adds higher-realism generation for enterprise creative teams
- No federal, compliance, or enterprise-procurement progress while rivals shipped flagships and a record raise
- The SpaceXAI structure and $1.25B/month Anthropic compute deal cast xAI more as a GPU landlord to rivals than a Grok enterprise vendor
- "Reckless financials" framing persists after the SpaceX IPO disclosed heavy cash burn against ~$500M ARR
---DEEPSEEK---
score: 17
trend: up
change: +1
+ Reported funding round nears a ~$44-45B valuation (Big Fund in talks to lead, Tencent and Alibaba participating), ~4x the figure floated two weeks earlier
+ Permanent 75% V4-Pro price cut to ~$0.44 per million input tokens holds the global cost-leadership floor, under a tenth of GPT-5.5
- The round is still "in talks," not closed, and Big Fund state backing deepens the US-procurement compliance problem
- V4-Pro quality still trails Western flagships (~9th globally), a gap that just widened against Opus 4.8
- US government-device bans and the broader compliance perimeter remain fully in force

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

LLM Meter — Week of May 31

Marcus Schuler

Get the Morning Briefing in your inbox.