San Francisco | February 6, 2026
Anthropic's Opus 4.6 arrived with a million-token context window, agent teams that run parallel workstreams, and a Cowork plugin that writes PowerPoints and runs multi-step research. Professional services stocks cratered within hours. Thomson Reuters fell 15.8%, LegalZoom dropped 20%, and Goldman's software basket posted its worst session since April. One model release, $300 billion in market cap gone.
The same day, OpenAI shipped GPT-5.3-Codex with the highest coding benchmarks of the year. No API. Mac only. A $20 subscription wall. The best model in the room, and enterprise developers can't touch it.
Markets don't read benchmarks. They read shipping manifests.
Stay curious,
Marcus Schuler
Anthropic Opus 4.6 Triggers $300 Billion Selloff, Outships OpenAI With Half the Staff

A 2,000-person company just outshipped rivals with ten times the headcount and wiped $300 billion from professional services stocks in a single trading day.
Anthropic released 30+ features and products in January alone, culminating in Opus 4.6, a model that leads GPT-5.2 by 144 Elo points on GDPval-AA knowledge work benchmarks. The release triggered an immediate market reaction: Thomson Reuters dropped 15.8%, LegalZoom fell 20%, FactSet slid 10%, and Goldman Sachs' software basket posted its worst session since April. Combined losses exceeded $300 billion.
The numbers matter less than what drove them. Anthropic's Cowork plugin, released alongside Opus 4.6, handles PowerPoint creation, data analysis, and multi-step research tasks that overlap directly with professional services workflows. Wall Street didn't sell legal tech stocks because of a benchmark score. It sold them because Opus 4.6 demonstrated entry-level professional work at production quality.
Anthropic runs this operation with roughly 2,000 employees. OpenAI has 4,000. Microsoft has 228,000. Google has 183,000. Yet Anthropic shipped agent teams, adaptive thinking, context compaction to handle million-token windows, and a cybersecurity initiative that discovered 500+ high-severity zero-day vulnerabilities during testing, with the model inventing new bug-hunting techniques on its own.
The company's thesis is counterintuitive: safety culture reduces bureaucratic friction. Sebastian Mallaby's analysis compares it to military units that fight harder because they believe in the cause. Whether the analogy holds, the output is measurable. Claude Code hit $1 billion in annualized revenue within six months. Enterprise adoption sits at 44%, with 75% of those customers running Anthropic's most capable models in production.
Anthropic is now raising $10 billion at a $350 billion valuation, with reports the round could double to $20 billion. The average enterprise spends $7 million annually on LLMs, up 180% year over year.
Why This Matters:
- Professional services stocks lost $300 billion in a day because one AI company shipped a product that does entry-level knowledge work at production quality.
- Anthropic's headcount-to-output ratio suggests safety-driven culture is a shipping advantage, not a constraint.
✅ Reality Check
What's confirmed: Opus 4.6 leads GPT-5.2 by 144 Elo points on GDPval-AA. Professional services stocks lost $300 billion in one trading session.
What's implied (not proven): Safety culture drives Anthropic's shipping velocity, not just talent density or funding.
What could go wrong: Frontier model commoditization within 12 months erodes Anthropic's premium pricing.
What to watch next: Whether the $10 billion raise closes at $350 billion. Q1 enterprise LLM spend data.

The One Number
750 million — Monthly active users on Google's Gemini app, per Alphabet's Q4 2025 earnings. That's up from 650 million last quarter and within striking distance of ChatGPT's estimated 810 million. Google added 100 million chatbot users in three months. The AI assistant race has a second front-runner.
GPT-5.3-Codex Beats Claude on Coding Benchmarks, Ships Mac-Only With No API

GPT-5.3-Codex posts the highest coding benchmark scores of 2026. It also shipped without the one thing enterprise developers need most: an API.
OpenAI's GPT-5.3-Codex posts the best coding benchmark scores of any model released this year. Terminal-Bench 2.0: 77.3%. SWE-Bench Pro: 56.8%. Cybersecurity CTF: 77.6%. Claude Opus 4.6 trails at 65.4% on Terminal-Bench. On paper, OpenAI wins.
In practice, developers can't use it. Codex launched as a macOS-only desktop app with no API access. Using it requires a $20/month ChatGPT Plus subscription, a separate CLI installation, and a standalone IDE extension. Claude shipped with full API access ($5/$25 per million tokens), cross-platform support, and single authentication from day one.
The Andreessen Horowitz enterprise survey quantifies the gap. 75% of Anthropic's enterprise customers run its most capable models in production. For OpenAI, 46%. OpenAI still leads overall market share at 53%, but that's down from 62% in 2024. Anthropic climbed to 18% from 14%.
The feature gap runs deeper. Opus 4.6 ships with agent teams that run parallel workstreams, a million-token context window, and four effort control levels. Codex offers none of these. Its inference runs 25% faster, but speed matters less when the model can't access your codebase through an API.
Model quality is converging. The differences that determined market share in 2024, raw intelligence and coding accuracy, compress into margins most developers cannot feel. Distribution and enterprise readiness predict market outcomes better than benchmark scores.
Why This Matters:
- OpenAI's best coding model can't reach the developers who would pay for it, while Anthropic's ships ready for every enterprise environment.
- Benchmark leadership no longer predicts market share when the winning model doesn't have an API.

AI Image of the Day

Prompt: Full body shot of a near-future humanoid robot designed for music performance, slim and human-proportioned body, smooth matte white and soft gray materials, minimal mechanical exposure, gentle minimalist face display with no human skin, slightly translucent face panel, soft warm LED eyes, wearing simple modern casual clothing made of synthetic fabric, holding an acoustic guitar, calm emotional atmosphere, realistic modern rooftop or quiet urban background, natural lighting, cinematic photorealism, grounded near-future technology, subtle and elegant design, believable real-world engineering, high detail, ultra realistic
🧰 AI Toolbox

How to Access Pro Creative Apps for $13/Month with Apple Creator Studio
Apple Creator Studio bundles Final Cut Pro, Logic Pro, Pixelmator Pro, and other professional creative tools into one subscription. New AI features enhance each app for faster editing and production.
Tutorial:
- Open the App Store on Mac and search for "Creator Studio"
- Subscribe for $12.99/month or $129/year
- Download included apps: Final Cut Pro, Logic Pro, Pixelmator Pro, and more
- Use AI-powered features like automatic color grading in Final Cut
- Access Logic Pro's AI session players for music production
- Edit photos with Pixelmator's ML-powered selection and enhancement tools
- Sync projects across devices with iCloud integration
URL: Available in the Mac App Store
What To Watch Next (24-72 hours)
- Amazon: Q4 results landed last night with AWS AI infrastructure spending front and center. Today's market reaction to 2026 capex guidance, expected to top $150 billion, is the clearest signal for how Big Tech prices the AI buildout.
- Spotify and Cloudflare: Both report Monday before and after market close respectively. Spotify's AI-driven personalization metrics and Cloudflare's web traffic data on AI bot activity offer two different lenses on how AI reshapes consumer and infrastructure markets.
- WSJ Technology Council Summit: The Wall Street Journal convenes tech executives in Palo Alto on Monday. Expect candid takes on AI ROI timelines and enterprise adoption bottlenecks from the C-suite crowd that writes the checks.
🛠️ 5-Minute Skill: Turn Raw Customer Feedback Into a Product Spec
Your inbox has 47 unread messages from the customer success team, all forwarded with subject lines like "another complaint about onboarding" and "FYI from enterprise client." The feedback is scattered, contradictory, and mostly emotional. Your PM wants a spec by Monday. You need to go from noise to signal fast.
Your raw input:
From CS team, last 2 weeks:
- "the onboarding wizard is confusing, client ABC Corp gave up after step 3" - sarah
- "enterprise clients keep asking for SSO and we say 'coming soon' but it's been 6 months" - mike
- "three clients this week asked if they can import data from CSV. we only support API" - sarah
- "Acme Inc loves the product but says the dashboard loads too slowly for their team of 50" - jason
- "onboarding: clients don't understand what 'workspace' means. they keep creating multiple when they need one" - sarah
- "client X wants to export reports as PDF, currently only CSV export available" - mike
- "two enterprise trials churned last month citing lack of SSO as dealbreaker" - jason
- "CSV import is the #1 feature request on our public roadmap board (234 votes)" - sarah
- "dashboard performance: fine for teams under 10, falls apart above 30 users" - jason
- "onboarding completion rate dropped from 73% to 61% after last redesign" - sarah
The prompt:
You are a product manager writing a feature specification from raw customer feedback. Your audience is the engineering team, and they need clear problem statements, not just feature requests.
From this customer feedback, produce:
1. PROBLEM CLUSTERS (group the feedback into 3-4 distinct problems, each with a name and a 1-sentence definition)
2. PRIORITY RANKING (rank the clusters by business impact, with your reasoning in one sentence each)
3. For the TOP PRIORITY cluster only, write a mini-spec:
- Problem statement (2 sentences)
- Success metric (1 measurable outcome)
- User story (As a [who], I want [what], so that [why])
- Scope boundary (what's IN and what's OUT of this spec)
- Open questions for engineering (2-3 things you need answered before building)
Customer feedback:
[paste feedback here]
Rules:
- Separate what customers SAY they want from the underlying problem. "We need CSV import" might mean "we can't get our data in."
- If feedback contradicts itself, note the contradiction.
- Use customer names and quotes where they strengthen the argument.
- Don't spec solutions. Spec the problem and the constraint. Let engineering propose the how.
What you get back:
PROBLEM CLUSTERSData ingestion gap — Clients can't get their data into the product without engineering resources. Three clients asked for CSV import this week; public roadmap shows 234 votes. Only API import exists today.Enterprise authentication blocker — SSO has been "coming soon" for six months. Two enterprise trials churned specifically over this. Active enterprise clients keep asking.Dashboard performance at scale — Dashboard works for small teams but degrades above 30 users. Acme Inc (team of 50) flagged this directly. Performance is fine under 10.Onboarding friction — Completion rate dropped from 73% to 61% after the last redesign. Clients abandon at step 3, get confused by "workspace" terminology, create duplicate workspaces.
PRIORITY RANKINGEnterprise authentication (SSO) — Direct revenue loss. Two churned trials with SSO cited as the dealbreaker. Every month without it costs qualified pipeline.Data ingestion (CSV import) — Highest volume request (234 votes), blocks adoption for non-technical users who can't use the API.Onboarding friction — 12-point completion drop is alarming, but it affects conversion rate, not churn of paying customers.Dashboard performance — Real problem but narrow blast radius. Only affects teams above 30 users, which is a small segment today.
MINI-SPEC: Enterprise SSO
Problem statement: Enterprise prospects and trial users expect single sign-on as a baseline requirement, not a premium add-on. Two qualified enterprise trials churned in the last month naming SSO as the deciding factor, and the CS team has been telling clients "coming soon" for six months, eroding trust.
Success metric: Zero enterprise trial churn citing SSO as the primary reason within 90 days of launch.
User story: As an IT administrator at a 50+ person company, I want to connect my identity provider so that my team can log in with their existing corporate credentials without creating separate passwords.
Scope — IN:
- SAML 2.0 support (covers Okta, Azure AD, OneLogin)
- Admin UI for configuring SSO connection
- Automatic user provisioning on first SSO login
Scope — OUT:
- SCIM provisioning (phase 2)
- Multiple identity providers per account
- Custom SAML attribute mapping
Open questions for engineering:
- Can we support SAML 2.0 without rearchitecting our current session management?
- What's the estimated work if we use an off-the-shelf library (e.g., WorkOS) vs. building from scratch?
- Do we need to migrate existing enterprise users, or can SSO be additive alongside email/password?
Why this works
The prompt separates clustering from prioritizing from speccing. Most people skip the clustering step and jump to speccing whatever the loudest customer asked for. Forcing problem clusters first reveals that ten different feature requests might be three actual problems. The "separate what they say from the underlying problem" rule catches the difference between "I want CSV import" (a solution) and "I can't get my data in" (a problem).
Where people get it wrong: Turning feedback into a feature list. "Customers want X, Y, Z" doesn't help engineering. They need the problem, the constraint, and the success metric. Let them design the solution.
What to use
Claude (Claude Sonnet 4.5 via claude.ai): Best at identifying the underlying problem behind feature requests. Handles contradictory feedback well. Watch out for: Scope sections can be too conservative. May exclude things engineering would consider trivial to add.
ChatGPT (GPT-4o with "Reasoning" toggled on): Strong at structuring the mini-spec cleanly. Produces engineering-ready formatting. Watch out for: Tends to propose solutions in the problem statement. Watch for "we should build X" creeping in.
Bottom line: Claude for honest problem framing. ChatGPT for structured output. For this type of task, two models is enough. Skip Gemini unless you're processing raw survey data in bulk.
AI & Tech News
SpaceX and xAI Complete $1.25 Trillion Merger
Elon Musk's SpaceX and xAI have completed a $1.25 trillion merger, one of the largest corporate deals in history. A technical breakthrough in orbital data centers last fall fast-tracked the combination of SpaceX's satellite capabilities with xAI's artificial intelligence technology.
Big Tech Plans Record $650 Billion Capital Spending in 2026
Alphabet, Amazon, Meta, and Microsoft have announced combined capex forecasts of approximately $650 billion for 2026, a 60% increase over last year. The surge is driven by data center construction as all four companies race to build AI infrastructure.
Goldman Sachs Partners With Anthropic to Deploy AI Agents in Banking
Goldman Sachs has partnered with Anthropic to develop AI agents that automate client vetting and onboarding at the investment bank. The collaboration signals Wall Street's deepening integration of AI into core compliance and customer service operations.
EU Declares TikTok's Addictive Design Illegal Under Digital Services Act
European regulators have issued preliminary findings that TikTok's infinite scroll and recommendation algorithm violate the Digital Services Act. The ruling targets design elements that lead to "compulsive" user behavior, with particular concern for children.
US and China Abstain From Global AI Weapons Declaration
A military AI summit concluded with just 35 countries signing a declaration affirming human responsibility over AI-powered weapons. The US and China both declined to participate, highlighting divisions among major powers over autonomous weapons governance.
Intel and AMD Warn Chinese Customers of Server CPU Shortages
Intel and AMD have notified Chinese customers of significant server CPU supply shortages, with Intel cautioning that delivery lead times could reach six months. The constraints add pressure to China's computing infrastructure at a time of growing US-China tech tension.
France Charges Four in China Espionage Case Targeting Starlink
French authorities have charged four individuals with spying for China, accusing them of attempting to obtain sensitive satellite data from Elon Musk's Starlink and other critical infrastructure. The case reflects growing concern over Chinese intelligence operations targeting Western space technology.
Sapiom Raises $15M to Let AI Agents Buy Their Own Software
Sapiom has secured $15 million in seed funding led by Accel to build a financial layer that lets enterprise AI agents automatically purchase technology services. The startup addresses an emerging need: as AI agents grow more autonomous, they need their own procurement infrastructure.
Indian Data Annotators Report Mental Health Crisis From AI Training Work
Women working as data annotators in rural India report significant psychological trauma from reviewing violent and explicit content used to train AI systems. The Guardian investigation highlights the hidden human cost of AI development outsourced to low-wage workers without mental health support.
USPTO Shifts Toward More Favorable Treatment of AI Patents
A key appeal has prompted the US Patent and Trademark Office to change its approach to AI patent applications, signaling more favorable treatment for machine learning inventions. The shift could lower barriers for AI innovators seeking patent protection in the US.
🚀 AI Profiles: The Companies Defining Tomorrow
Goodfire

Goodfire wants to crack open the black box of AI. The San Francisco company builds interpretability tools that let researchers see inside neural networks, debug their behavior, and design models that work as intended. 🔬
Founders
Tom McGrath co-founded Goodfire after building the interpretability team at Google DeepMind. He relocated to San Francisco and initially worked from a windowless office at South Park Commons. Nick Cammarata brought experience from OpenAI's interpretability research. Leon Bergen, a UC San Diego cognitive science professor, joined on leave from academia. Eric Ho runs the company as CEO. The founding team spans the three labs that defined modern AI safety research.
Product
A model design environment with two pillars. The first maps training workflows and identifies design flaws before deployment. The second monitors models in production. Goodfire developed stochastic parameter decomposition, a technique that strips models component by component to determine what each part does. Early results: hallucinations cut by 50% in one project, and a novel class of Alzheimer's biomarkers discovered by reverse-engineering an epigenetic model built by healthcare partner Prima Mente. Mayo Clinic and Arc Institute are also design partners.
Competition
Anthropic, Google DeepMind, and OpenAI run interpretability teams internally. Weights & Biases and Arthur AI handle model monitoring from the outside. Open-source tools like TransformerLens offer free alternatives for researchers. Goodfire's edge: it packages interpretability as a commercial product, not a research paper or an observability dashboard. The risk: if frontier labs solve interpretability in-house, the market for third-party tools shrinks.
Financing 💰
$150M Series B at a $1.25B valuation. B Capital led. Juniper Ventures, Menlo Ventures, Lightspeed, DFJ Growth, Salesforce Ventures, Wing Venture Capital, and South Park Commons participated. Eric Schmidt invested personally. Total raised exceeds $200M.
Future ⭐⭐⭐⭐
Regulators want to know how AI decides. Enterprises deploying models in healthcare and finance need the same answers. Goodfire sells the flashlight. The risk: interpretability could become a feature inside every major model provider rather than a standalone market. But a unicorn valuation and Mayo Clinic on the partner list suggest buyers disagree. 🔍
🔥 Yeah, But...
OpenAI launched GPT-5.3-Codex on Thursday with a headline claim: "our first model that was instrumental in creating itself." The team used early versions to debug training, manage deployment, and diagnose its own evaluations. The system card acknowledges the model could aid cyberattacks.
Sources: NBC News, February 5, 2026 | OpenAI, February 5, 2026
Our take: For a decade, AI safety researchers have warned that recursive self-improvement, the moment an AI starts building better versions of itself, is the capability to watch most carefully. OpenAI put it in a press release as a selling point. The model debugged its own training run, optimized its own infrastructure, and analyzed its own performance gains. The system card, buried below the benchmarks, notes it could help hackers. Meanwhile, the product lead called the progress "crazy." He meant it as a compliment. The safety community read the same word differently. Somewhere between the marketing copy and the system card, there's a company having two conversations at once and hoping nobody notices they contradict each other.
