GPT-5.5 by the Numbers. Higher Prices, Split Benchmarks

OpenAI released GPT-5.5 on April 23, 2026, priced at $5 input and $30 output per million tokens on the API, with rollout to ChatGPT Plus, Pro, Business, and Enterprise tiers first while API access follows later. The company reports 82.7% on Terminal-Bench 2.0 and roughly 40% fewer output tokens than GPT-5.4, though the benchmark owner's leaderboard showed 82.0% plus or minus 2.2 the same day. API list prices double GPT-5.4's rates, and third-party hallucination tests put the model well behind Anthropic's Claude Opus 4.7.

Key Takeaways

GPT-5.5 API lists at $5 input and $30 output per million tokens with $0.50 cached input. GPT-5.5 Pro announced at $30/$180, arriving in the API "very soon."
Terminal-Bench 2.0 shows a 0.7-point gap between OpenAI's 82.7% and the owner leaderboard's 82.0%, within the ±2.2 confidence band.
Output tokens drop 40% versus GPT-5.4, yet GPT-5.5 xhigh runs about 20% more expensive to operate on Artificial Analysis's index.
Artificial Analysis's AA-Omniscience test puts GPT-5.5 hallucinations at 86% against Claude Opus 4.7 at 36%.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

The benchmarks OpenAI is selling

Four scores anchor the launch. They cover agentic coding, computer-use automation, software engineering, and web navigation. Terminal-Bench is the one to watch, because the benchmark owner's own leaderboard shows 82.0% plus or minus 2.2, which is statistically consistent with OpenAI's 82.7% figure but not identical to the number the company chose to headline.

Launch-Day Benchmarks

Benchmark	What it tests	GPT-5.5	Independent
Terminal-Bench 2.0	Agentic terminal coding	82.7%	82.0% ±2.2
OSWorld-Verified	Computer-use automation	78.7%	n/a
SWE-Bench Pro	Software engineering	58.6%	n/a
BrowseComp	Web navigation	84.4%	n/a

Source: OpenAI launch page (April 23, 2026); Terminal-Bench 2.0 leaderboard.

Pricing, stacked against the room

GPT-5.5 lists at exactly double GPT-5.4 on both input and output, with cached input at $0.50 per million. GPT-5.5 Pro is available first in ChatGPT for Pro, Business, and Enterprise users; OpenAI says an API version will follow "very soon" at $30 input and $180 output per million tokens, six times the standard tier. Claude Opus 4.7 matches GPT-5.5 on input and undercuts it on output, but Anthropic warns its tokenizer expands English text by 1.0 to 1.35 times, which narrows the real gap.

Per-Million-Token Pricing, Flagship Models

Model	Input	Output	Tier
GPT-5.5	$5	$30	Flagship; cached input $0.50
GPT-5.5 Pro	$30	$180	Announced for API "very soon"; not live at launch
Claude Opus 4.7	$5	$25	Premium reasoning
GPT-5.4	$2.50	$15	Prior flagship
Claude Sonnet 4.6	$3	$15	Enterprise default

Prices in USD per 1M tokens. Sources: OpenAI and Anthropic pricing pages. Claude's tokenizer expands English text roughly 1.0 to 1.35 times, per Anthropic documentation.

Read past the launch slide

Numbers, tradeoffs, and buyer math on every model release. One concise briefing each morning, built for people making procurement decisions.

No spam. Unsubscribe anytime.

Strengths and weaknesses, side by side

The real pitch to buyers is workflow economics, not raw price. GPT-5.5 bills more per million tokens but needs fewer of them per task. The offsetting weakness sits on the factuality side, where independent tests put the model a long way behind Claude Opus 4.7.

Strengths vs Weaknesses

Strength	Evidence	Weakness	Evidence
Token efficiency	~40% fewer output tokens vs GPT-5.4 on Artificial Analysis xhigh	Hallucination rate	86% on AA-Omniscience vs Claude Opus 4.7 at 36%
Agentic coding	82.7% Terminal-Bench 2.0	List-price jump	2× GPT-5.4 on both input and output
Latency parity	Matches GPT-5.4 per-token serving speed	API delay	"API deployments require different safeguards"
Native omnimodality	Text, image, audio, video in one system	Safety re-check	UK AISI found a universal jailbreak; fix not re-verified

Sources: OpenAI, Artificial Analysis, UK AI Safety Institute disclosure, AA-Omniscience benchmark.

Where the model actually pulls its weight

Match the task to the scorecard. GPT-5.5 rewards workflows where tokens are expensive and chain-of-actions is long. For look-up-style tasks where facts matter more than orchestration, the math tilts toward Claude Opus 4.7 or Sonnet 4.6.

Best Use Cases, Calibrated to the Data

Use case	Fit	Why
Agentic coding via Codex	Strong	82.7% Terminal-Bench; 40% fewer output tokens per run
Computer-use automation	Strong	78.7% on OSWorld-Verified, per OpenAI launch data
Deep web research flows	Strong	84.4% BrowseComp; native omnimodality
Long-document drafting	Solid	Expanded context handling at parity latency
High-stakes factual Q&A	Weak	86% hallucination rate on AA-Omniscience
Offensive security research	Blocked	Cyber safeguards active; use GPT-5.4-Cyber instead

Fit calls based on launch-day benchmark positioning and third-party evaluations.

Who gets it, and when

Rollout is staged by product surface, not by API SLA. ChatGPT subscribers get the model today; developers on the API wait. That split is the single biggest procurement signal in the launch, because it means no third-party product can integrate GPT-5.5 until OpenAI lifts the safeguard gate.

Availability, April 23, 2026

Surface	GPT-5.5	GPT-5.5 Pro
ChatGPT Plus	Available	n/a
ChatGPT Pro	Available	Available
ChatGPT Business	Available	Available
ChatGPT Enterprise	Available	Available
Codex	Available	n/a
OpenAI API	Delayed	Delayed

OpenAI cites "API deployments require different safeguards" as the reason for the delay.

Frequently Asked Questions

How much does GPT-5.5 cost per million tokens?

$5 input, $30 output, and $0.50 cached input per million tokens for standard GPT-5.5. GPT-5.5 Pro is announced at $30 input and $180 output per million tokens; OpenAI says it will arrive in the API "very soon" but it is not live at launch. Standard GPT-5.5 is exactly double GPT-5.4's $2.50 and $15.

Why does OpenAI's Terminal-Bench number differ from the leaderboard?

OpenAI reports 82.7% on its launch page. The benchmark owner's leaderboard showed 82.0% plus or minus 2.2 the same day. Within the confidence interval, but editorially notable on launch day.

Is GPT-5.5 cheaper to run than GPT-5.4?

No on list price, yes on output-token volume. Artificial Analysis finds GPT-5.5 xhigh runs about 20% more expensive on its index, but the model uses roughly 40% fewer output tokens, which can lower total cost per completed workflow depending on the task.

How does GPT-5.5 compare to Claude Opus 4.7 on hallucination?

On Artificial Analysis's AA-Omniscience benchmark, GPT-5.5 hallucinates on 86% of items and Claude Opus 4.7 on 36%. The gap is wide enough to matter for factual Q&A workloads.

When will GPT-5.5 hit the API?

OpenAI has not given a date. The launch page says "API deployments require different safeguards," which reflects ongoing work after a universal jailbreak was identified during UK AI Safety Institute testing.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Maria Garcia

Los Angeles

Tech culture and generative AI reporter covering the intersection of AI with digital culture, consumer behavior, and content creation platforms. Focusing on technology's beneficiaries and those left behind by AI adoption. Based in California.

OpenAI GPT-5.5 by the Numbers. Higher Prices, Disputed Benchmarks

The benchmarks OpenAI is selling

Pricing, stacked against the room

Strengths and weaknesses, side by side

Where the model actually pulls its weight

Who gets it, and when

Maria Garcia

Get the Morning Briefing in your inbox.

Related Stories

Anthropic Traces Claude Code Quality Drop to Three Product Changes, Resets Limits

Meta Cuts 8,000 Jobs, Leaves 6,000 Roles Empty in AI Efficiency Push

GPT-5.5 wasn't a benchmark flex. It was OpenAI's distribution doctrine.