San Francisco | Friday, March 6, 2026
OpenAI shipped GPT-5.4 on Thursday and the benchmarks told an interesting story. Reasoning barely budged. Professional task completion jumped 12 points. The company stopped selling intelligence and started selling labor, a model that clicks through spreadsheets, navigates desktops, and does your job 83% as well as you do.
Anthropic's Dario Amodei apologized for the leaked memo that called the Pentagon showdown political. Then he offered to keep Claude running on classified systems at cost. The Pentagon made the supply chain risk designation official anyway.
And for anyone who thinks their iPhone is just for scrolling: it is a full development terminal, if you set it up right.
Stay curious,
Marcus Schuler
OpenAI GPT-5.4 Scores 83% Against Human Professionals as Reasoning Gains Flatline

GPT-5.4 matches or outperforms human experts in 83% of professional tasks across 44 occupations. The reasoning benchmarks that defined the last two years of AI competition barely moved.
OpenAI released GPT-5.4 on Thursday with native computer use, a 1 million token context window, and ChatGPT embedded directly in Excel and Google Sheets. The model scores 83% on GPTval, OpenAI's internal benchmark for economically valuable work. GPT-5.1 scored 38.8% on the same test. Three model generations doubled the number.
The shift is in what improved. Reasoning and coding gains were incremental over GPT-5.2. Professional task execution, the ability to click through software, fill out forms, and run multi-step workflows, jumped 12 points. OpenAI also overhauled how the API handles tool calling with a system called Tool Search that cut token usage by 47% across 250 tasks and 36 MCP servers.
Individual claims are 33% less likely to be false and full responses 18% less likely to contain errors compared to GPT-5.2. The hallucination gains are real. The intelligence gains are not the story.
The model that debuted with modest gains last August has found its product thesis: not a smarter chatbot, but a cheaper worker.
Why This Matters:
- GPT-5.4's pivot from reasoning to task completion signals OpenAI now competes with outsourcing firms, not just AI labs
- The 47% token reduction in tool calling makes large agentic systems economically viable for the first time at scale
Reality Check
What's confirmed: 83% GPTval score across 44 occupations. Native computer use in the API and Codex. 33% fewer false claims vs GPT-5.2.
What's implied (not proven): That professional task completion translates to actual job displacement at enterprise scale.
What could go wrong: Mercor's APEX benchmark showed even the best models complete fewer than 25% of professional tasks on the first attempt. The 83% is OpenAI's test, not an independent one.
What to watch next: Enterprise deployment numbers over the next quarter. If Fortune 500 companies start replacing contractor workflows with GPT-5.4, the labor thesis is real.

The One Number
83% — The rate at which OpenAI's GPT-5.4 matches or outperforms human professionals across 44 occupations, per the company's GPTval benchmark released Thursday. GPT-5.1 scored 38.8% on the same test. Three model generations doubled a machine's odds of doing your job as well as you do.
Source: ZDNET / OpenAI
Anthropic CEO Apologizes for Leaked Memo, Offers Pentagon Claude at Cost

Dario Amodei said the leaked internal memo was a mistake. Then he published a 2,000-word blog post drawing exactly two red lines and offering everything else at a discount.
Anthropic received the formal supply chain risk designation from the Pentagon on Thursday, making official what Defense Secretary Pete Hegseth announced on social media last week. The company now has six months before federal agencies must stop using Claude on classified systems. OpenAI and xAI have already signed replacement deals.
Amodei's apology targeted the leaked Slack memo that suggested Anthropic was being punished for not donating to Trump. The blog post struck a different tone: Anthropic supports intelligence analysis, operational planning, cyber operations, and battlefield simulation. It draws the line at autonomous weapons and mass domestic surveillance. Two restrictions out of hundreds of possible military applications.
The company offered to provide Claude at "nominal cost" during the transition and cited 10 USC 3252, which requires "the least restrictive means necessary." Thirty former defense and intelligence officials wrote to Congress Thursday calling the designation a "dangerous precedent." Hundreds of OpenAI and Google employees signed a separate letter urging the DOD to reverse course.
Anthropic is the first American company ever publicly designated a supply chain risk. The label has traditionally been reserved for foreign adversaries.
Why This Matters:
- The designation sets a precedent that the government can blacklist domestic tech companies over contract disagreements, not security threats
- Anthropic's legal challenge under 10 USC 3252 will test whether supply chain risk authority can be used as a procurement weapon

AI Image of the Day

Prompt: Soft, natural shot of a grandfather and young girl asleep on a bus. Their heads rest on each other, hands gently touching. Sunlight streams through the windows, highlighting the peacefulness of the moment. The bus interior is bright and colorful, enhancing the tender connection.
Your iPhone Is a Full Dev Terminal if You Set It Up Right

Blink Shell, Mosh, and tmux turn an iPhone into a persistent connection to your development machine. No jailbreak. No workarounds. Just configuration.
Most developers treat their phone as a notification screen. The combination of Blink Shell (a terminal emulator with SSH and Mosh support), Mosh (a protocol that survives network switches and sleep), and tmux (a terminal multiplexer that keeps sessions alive) changes that.
The setup takes about 15 minutes. Install Blink Shell from the App Store. Configure Mosh on your server. Attach to a tmux session. Your terminal persists through subway tunnels, Wi-Fi handoffs, and hours of phone sleep. Pick up the phone, and you are back where you left off.
The practical use case: monitoring deployments, reviewing logs, running quick commands, or pairing with Claude Code from a park bench. Not a laptop replacement. A laptop extension that fits in your pocket and never drops the connection.
Why This Matters:
- Persistent mobile terminals eliminate the "I need my laptop" bottleneck for ops, SRE, and on-call developers
- Mosh's prediction-based protocol makes remote work viable on unreliable mobile networks

🧰 AI Toolbox

How to Create AI Videos with Cinematic Camera Control Using Higgsfield
Higgsfield bundles multiple AI video, image, and audio models into one interface. Pick from Sora 2, Kling 3.0, or Higgsfield's own Soul models, then generate videos up to 30 seconds with precise control over character movement and camera angles. The platform includes lip sync, voice cloning, and over 100 visual effects. Free tier available.
Tutorial:
- Go to higgsfield.ai and create a free account
- Choose your starting point: text-to-video, image-to-video, or upload a reference image
- Select a model based on your goal: Kling 3.0 for motion control, Sora 2 for cinematic style, or Soul Cinema for film-grade imagery
- Write a prompt describing your scene, including camera movement and character actions
- Use Soul ID to create a consistent character you can reuse across multiple scenes
- Add audio with the built-in voice cloning or ElevenLabs integration
- Apply visual effects from the library and export your finished video
What To Watch Next (24-72 hours)
- Anthropic vs. Pentagon in court: Anthropic received the formal supply chain risk letter Thursday and has signaled a legal challenge under 10 USC 3252. A federal court filing could come as early as next week. Thirty former defense and intelligence officials wrote to Congress Thursday demanding an investigation.
- SXSW opens in Austin: SXSW EDU kicks off Monday, March 9, with the main festival running March 12-18. The pre-SXSW AI Startup Rodeo on March 10-11 brings AI founders and investors together before the official programming starts. Amy Webb delivers her annual Emerging Tech Trend Report.
- Nvidia GTC countdown: Jensen Huang keynotes March 16 at 11 AM PT from SAP Center in San Jose. He has promised to unveil a chip "the world has never seen." More than 30,000 attendees from 190 countries. Financial analyst Q&A on March 17.
🛠️ 5-Minute Skill: Turn Contract Redlines Into Negotiation Talking Points
Legal just sent back a vendor contract with 14 redlines. Your procurement lead needs to get on a call in two hours and negotiate the terms down. The redlines are dense, scattered across 30 pages, and nobody has time to read them twice. You need a one-page cheat sheet: what to push back on, what to concede, and what to trade.
Your raw input:
Vendor SaaS Agreement — Redlined by Legal (key excerpts)
Section 3.2 (Pricing):
REDLINE: Vendor wants 8% annual price escalator. Our standard cap is 3%.
Legal note: "Non-starter above 5%. Industry norm is CPI-linked."
Section 5.1 (Data Ownership):
REDLINE: Vendor claims joint ownership of "derived insights" from our
data. Legal note: "Absolutely not. We own all data and derivatives.
This is the hill we die on."
Section 7.3 (SLA):
REDLINE: Vendor offers 99.5% uptime SLA with service credits only.
Legal note: "Need 99.9% with right to terminate after 3 consecutive
months below threshold."
Section 8.1 (Termination):
REDLINE: 12-month lock-in with 6-month notice for non-renewal.
Legal note: "Push for 90-day notice. 6 months is excessive for a
SaaS contract."
Section 9.4 (Liability Cap):
REDLINE: Vendor caps liability at 12 months of fees paid.
Legal note: "Standard, but push for carve-outs on data breach and IP
infringement. Uncapped for those."
Section 11.2 (Auto-Renewal):
REDLINE: Contract auto-renews for successive 12-month terms.
Legal note: "Need opt-in renewal, not auto-renewal. Or at minimum,
60-day written notice window."
Section 12.1 (Governing Law):
REDLINE: Vendor wants Delaware law, we want California.
Legal note: "Low priority. Concede if needed as a bargaining chip."
The prompt:
You are a senior procurement advisor preparing a negotiation cheat sheet
from legal redlines. The procurement lead has 2 hours before the call
and needs to know exactly what to fight for, what to trade, and what to
concede.
From these redlines, produce:
1. Must-win items: non-negotiable positions with the business reason
(not just the legal reason)
2. Trade items: positions we'd like but can concede in exchange for
something else
3. Concession items: low-priority items to give up strategically
4. Suggested trades: specific "we'll give you X if you give us Y"
packages
5. Opening position vs. walk-away position for the top 3 issues
Legal redlines:
[paste redlines here]
Rules:
- Business reasons beat legal reasons on a negotiation call. Frame
everything in terms the vendor's sales rep understands: deal size,
renewal likelihood, timeline.
- For each must-win, include the consequence of losing it.
- Trades should feel fair to both sides. Lopsided proposals stall
negotiations.
- Keep the entire output under 300 words. This goes on one page, taped
to a monitor during the call.
What you get back:
Must-Win (Do Not Concede)
Data ownership (5.1): We own all data and derived insights. Full stop. Business reason: Joint ownership means the vendor can monetize our customer data or use our analytics to help competitors. If we lose this: Board-level data governance risk and potential regulatory exposure.
Uptime SLA (7.3): 99.9% with termination right after 3 consecutive months below threshold. Business reason: Our operations team runs real-time dashboards off this platform. 99.5% means 43 hours of downtime per year vs. 8 hours at 99.9%. If we lose this: We absorb downtime cost with no recourse.
Price escalator (3.2): Cap at 5% annual maximum. Business reason: 8% compounds to 47% over 5 years. Our budget planning can't absorb that variance. If we lose this: Contract becomes unpredictable after year 2.
Trade Items (Bargaining Chips)
Termination notice (8.1): Push for 90 days, accept 120 if needed.
Auto-renewal (11.2): Push for opt-in, accept auto-renewal with 60-day notice window.
Liability carve-outs (9.4): Push for uncapped on data breach and IP. Accept 24-month cap as fallback.
Concession (Give Strategically)
Governing law (12.1): Concede Delaware. Low business impact, high perceived value to vendor.
Suggested Trades
- "We'll accept Delaware governing law and 120-day termination notice if you move data ownership to sole ownership and uptime to 99.9%."
- "We'll agree to auto-renewal with a 60-day opt-out window if you cap the annual escalator at 5%."
Top 3 Positions
Why this works
The prompt reframes legal language as business risk. Vendor sales reps don't respond to "Section 5.1 is problematic." They respond to "this puts the deal at risk." Structuring the output as must-win/trade/concede gives the procurement lead a clear hierarchy for the call. The suggested trade packages are the most valuable output because they turn a list of demands into a negotiation.
Where people get it wrong: Asking AI to "summarize these contract redlines." You'll get a shorter version of what legal already wrote. The procurement lead doesn't need a summary. They need a playbook: what to fight for, what to trade, and what to give away to win something bigger.
What to use
Claude (claude.ai): Strongest at identifying which concessions have asymmetric value (low cost to us, high perceived value to them). Good at framing business consequences. Watch out for: May be overly cautious. Procurement calls require some confidence in your positions.
ChatGPT: Good at concise, punchy formatting that fits on one page. Strong at scripting exact trade language. Watch out for: Sometimes suggests trades that favor our side too heavily, which stalls real negotiations. Check that each trade feels balanced.
AI & Tech News
SpaceX Targets $1.75 Trillion IPO With Plans to Raise $50 Billion
SpaceX is preparing to go public at a $1.75 trillion valuation, aiming to raise $50 billion in what would be one of the largest IPOs in history. The figure represents a sevenfold increase from the company's $200 billion private valuation in October 2024.
Pentagon Tested OpenAI Models Through Microsoft Azure Before Military Ban Was Lifted
Current and former OpenAI employees told WIRED the Department of Defense experimented with OpenAI models through Microsoft's Azure cloud before OpenAI officially removed its ban on military use in January 2024. The allegations suggest the Pentagon accessed the technology through Microsoft's licensing arrangement, circumventing OpenAI's own usage policies at the time.
Google Quietly Releases Workspace CLI That Lets AI Agents Access Gmail, Drive, and Docs
Google published an unsupported command-line interface for Workspace that enables agentic AI tools to interact directly with Gmail, Calendar, Drive, and Docs. The move opens a direct pipeline for personal AI assistants to manage productivity apps programmatically.
Cursor Launches Automations for Event-Driven AI Coding Agents
Cursor rolled out Automations, a feature that triggers AI coding agents automatically from code changes, Slack messages, or scheduled timers. The tool reflects the broader push toward agentic development where autonomous agents handle tasks with minimal human oversight.
ByteDance Seedance 2.0 Hits Compute Wall as Users Wait Hours for AI Video
ByteDance's Seedance 2.0 video generation model is struggling with compute constraints that force users to wait hours for a single video. Copyright complaints are also piling up against the model as demand outstrips the infrastructure available to Chinese AI firms.
Cloverleaf Raises $300 Million to Broker Land and Power for AI Data Centers
Former Microsoft executive Brian Janous built Cloverleaf into a land and utility brokerage for AI data center developers, raising $300 million to expand. The company packages electricity access and real estate for tech firms scrambling to power AI infrastructure.
Senate Passes COPPA 2.0 Unanimously, Sending Children's Privacy Bill to the House
The Senate unanimously passed COPPA 2.0, a bill establishing new privacy protections for children's personal data online. The legislation faces an uncertain future in the House, which has failed to advance the bill despite repeated bipartisan Senate support.
UK Government Delays AI Copyright Rule Changes After Creative Industry Backlash
The UK government will delay proposed changes to copyright rules governing AI training data after a two-month consultation failed to reach consensus. Creative industries pushed back on proposals they said allowed unauthorized use of copyrighted material without adequate compensation.
Indonesia to Ban YouTube, TikTok, and Facebook for Children Under 16
Indonesia will prohibit children under 16 from accessing YouTube, TikTok, Facebook, Instagram, Threads, X, and Roblox starting March 28. Communication Minister Meutya Hafid classified the platforms as "high risk" for minors.
Epic Games CEO Explains Google Settlement, Says Apple Fight Continues
Tim Sweeney told GamesBeat the Google Play Store settlement locks in meaningful reforms for developers worldwide, nearly 2,030 days after both Google and Apple removed Fortnite. Sweeney said the Apple case remains ongoing because the two platforms present fundamentally different competitive circumstances.
🚀 AI Profiles: The Companies Defining Tomorrow
Validio

Validio catches bad data before it breaks AI models. The Stockholm startup monitors billions of records for anomalies and quality issues, then traces the source through automated lineage mapping. 🔍
Founders
Patrik Liu Tran founded Validio in 2019 and serves as CEO. Tran kept seeing ambitious AI projects stall or fail because the underlying data was unreliable. The company is headquartered in Stockholm, Sweden, and is now expanding into the US and UK.
Product
An intelligent data management platform that monitors data health, detects anomalies, and maps data lineage across billions of records automatically. Unlike traditional tools that require engineers to write fixed rules, Validio sets up in minutes and operates independently. The company claims 95% faster detection and resolution of data quality issues versus alternatives, and says customers need 90% fewer people to manage data quality. Target industries: financial services, manufacturing, and telecommunications.
Competition
Monte Carlo pioneered the "data observability" category and raised $135 million. Collibra focuses on data governance with a $5.25 billion valuation. Atlan, Bigeye, and Soda compete on different slices of the pipeline. Validio differentiates by serving both data engineers and business users from one platform. The risk: Monte Carlo and Collibra have deeper enterprise sales channels and larger customer bases.
Financing 💰
$30 million Series A led by Plural, with Lakestar, J12, and angel investors Kevin Ryan, Denise Persson, and Emil Eifrem participating. Total raised: $47 million.
Future ⭐⭐⭐
Every company building AI pipelines has the same dirty secret: the data feeding those models is messy, inconsistent, and poorly tracked. Validio's 800% ARR growth suggests the problem found its buyer. The $30 million gives runway for US, UK, and European expansion. But data quality is becoming a feature, not a product. Snowflake, Databricks, and the hyperscalers all want to own this layer. Validio needs to land enterprise accounts before the platform giants absorb the category. Clean data is the unglamorous foundation everything else depends on. Someone has to sell the plumbing. 🔧
🔥 Yeah, But...
Anthropic published a blog post Thursday contesting the Department of War's supply chain risk designation. The company drew exactly two red lines: no autonomous weapons, no mass domestic surveillance. It then listed everything it supports: intelligence analysis, operational planning, modeling and simulation, and cyber operations. Anthropic offered to provide Claude at "nominal cost" during the legal transition and cited the statute requiring "the least restrictive means necessary."
Sources: Anthropic, March 5, 2026
Our take: Anthropic published what reads like a Pentagon job application formatted as a principled stand. Two red lines, then 2,000 words listing everything else it would be delighted to help with: intelligence analysis, war planning, cyber ops, battlefield simulation. It offered Claude at "nominal cost" during the transition, which is corporate for "please don't leave." The blog quotes a statute requiring "the least restrictive means necessary," an interesting phrase for a company volunteering to cut its own price. Six months ago, Anthropic's brand was "the AI lab too cautious for the military." Now the pitch is: we'll do it cheap, we'll do it fast, we just won't build the Terminator. Two red lines out of a thousand. The safety company is running a clearance sale.
