OpenAI released GPT-5.5 on April 23, 2026, priced at $5 input and $30 output per million tokens on the API, with rollout to ChatGPT Plus, Pro, Business, and Enterprise tiers first while API access follows later. The company reports 82.7% on Terminal-Bench 2.0 and roughly 40% fewer output tokens than GPT-5.4, though the benchmark owner's leaderboard showed 82.0% plus or minus 2.2 the same day. API list prices double GPT-5.4's rates, and third-party hallucination tests put the model well behind Anthropic's Claude Opus 4.7.

Key Takeaways

AI-generated summary, reviewed by an editor. More on our AI guidelines.

The benchmarks OpenAI is selling

Four scores anchor the launch. They cover agentic coding, computer-use automation, software engineering, and web navigation. Terminal-Bench is the one to watch, because the benchmark owner's own leaderboard shows 82.0% plus or minus 2.2, which is statistically consistent with OpenAI's 82.7% figure but not identical to the number the company chose to headline.

Launch-Day Benchmarks

BenchmarkWhat it testsGPT-5.5Independent
Terminal-Bench 2.0Agentic terminal coding82.7%82.0% ±2.2
OSWorld-VerifiedComputer-use automation78.7%n/a
SWE-Bench ProSoftware engineering58.6%n/a
BrowseCompWeb navigation84.4%n/a

Source: OpenAI launch page (April 23, 2026); Terminal-Bench 2.0 leaderboard.

Pricing, stacked against the room

GPT-5.5 lists at exactly double GPT-5.4 on both input and output, with cached input at $0.50 per million. GPT-5.5 Pro is available first in ChatGPT for Pro, Business, and Enterprise users; OpenAI says an API version will follow "very soon" at $30 input and $180 output per million tokens, six times the standard tier. Claude Opus 4.7 matches GPT-5.5 on input and undercuts it on output, but Anthropic warns its tokenizer expands English text by 1.0 to 1.35 times, which narrows the real gap.

Per-Million-Token Pricing, Flagship Models

ModelInputOutputTier
GPT-5.5$5$30Flagship; cached input $0.50
GPT-5.5 Pro$30$180Announced for API "very soon"; not live at launch
Claude Opus 4.7$5$25Premium reasoning
GPT-5.4$2.50$15Prior flagship
Claude Sonnet 4.6$3$15Enterprise default

Prices in USD per 1M tokens. Sources: OpenAI and Anthropic pricing pages. Claude's tokenizer expands English text roughly 1.0 to 1.35 times, per Anthropic documentation.

Strengths and weaknesses, side by side

The real pitch to buyers is workflow economics, not raw price. GPT-5.5 bills more per million tokens but needs fewer of them per task. The offsetting weakness sits on the factuality side, where independent tests put the model a long way behind Claude Opus 4.7.

Strengths vs Weaknesses

StrengthEvidenceWeaknessEvidence
Token efficiency ~40% fewer output tokens vs GPT-5.4 on Artificial Analysis xhigh Hallucination rate 86% on AA-Omniscience vs Claude Opus 4.7 at 36%
Agentic coding 82.7% Terminal-Bench 2.0 List-price jump 2× GPT-5.4 on both input and output
Latency parity Matches GPT-5.4 per-token serving speed API delay "API deployments require different safeguards"
Native omnimodality Text, image, audio, video in one system Safety re-check UK AISI found a universal jailbreak; fix not re-verified

Sources: OpenAI, Artificial Analysis, UK AI Safety Institute disclosure, AA-Omniscience benchmark.

Where the model actually pulls its weight

Match the task to the scorecard. GPT-5.5 rewards workflows where tokens are expensive and chain-of-actions is long. For look-up-style tasks where facts matter more than orchestration, the math tilts toward Claude Opus 4.7 or Sonnet 4.6.

Best Use Cases, Calibrated to the Data

Use caseFitWhy
Agentic coding via CodexStrong82.7% Terminal-Bench; 40% fewer output tokens per run
Computer-use automationStrong78.7% on OSWorld-Verified, per OpenAI launch data
Deep web research flowsStrong84.4% BrowseComp; native omnimodality
Long-document draftingSolidExpanded context handling at parity latency
High-stakes factual Q&AWeak86% hallucination rate on AA-Omniscience
Offensive security researchBlockedCyber safeguards active; use GPT-5.4-Cyber instead

Fit calls based on launch-day benchmark positioning and third-party evaluations.

Who gets it, and when

Rollout is staged by product surface, not by API SLA. ChatGPT subscribers get the model today; developers on the API wait. That split is the single biggest procurement signal in the launch, because it means no third-party product can integrate GPT-5.5 until OpenAI lifts the safeguard gate.

Availability, April 23, 2026

SurfaceGPT-5.5GPT-5.5 Pro
ChatGPT PlusAvailablen/a
ChatGPT ProAvailableAvailable
ChatGPT BusinessAvailableAvailable
ChatGPT EnterpriseAvailableAvailable
CodexAvailablen/a
OpenAI APIDelayedDelayed

OpenAI cites "API deployments require different safeguards" as the reason for the delay.

Frequently Asked Questions

How much does GPT-5.5 cost per million tokens?

$5 input, $30 output, and $0.50 cached input per million tokens for standard GPT-5.5. GPT-5.5 Pro is announced at $30 input and $180 output per million tokens; OpenAI says it will arrive in the API "very soon" but it is not live at launch. Standard GPT-5.5 is exactly double GPT-5.4's $2.50 and $15.

Why does OpenAI's Terminal-Bench number differ from the leaderboard?

OpenAI reports 82.7% on its launch page. The benchmark owner's leaderboard showed 82.0% plus or minus 2.2 the same day. Within the confidence interval, but editorially notable on launch day.

Is GPT-5.5 cheaper to run than GPT-5.4?

No on list price, yes on output-token volume. Artificial Analysis finds GPT-5.5 xhigh runs about 20% more expensive on its index, but the model uses roughly 40% fewer output tokens, which can lower total cost per completed workflow depending on the task.

How does GPT-5.5 compare to Claude Opus 4.7 on hallucination?

On Artificial Analysis's AA-Omniscience benchmark, GPT-5.5 hallucinates on 86% of items and Claude Opus 4.7 on 36%. The gap is wide enough to matter for factual Q&A workloads.

When will GPT-5.5 hit the API?

OpenAI has not given a date. The launch page says "API deployments require different safeguards," which reflects ongoing work after a universal jailbreak was identified during UK AI Safety Institute testing.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Los Angeles

Tech culture and generative AI reporter covering the intersection of AI with digital culture, consumer behavior, and content creation platforms. Focusing on technology's beneficiaries and those left behind by AI adoption. Based in California.