Anthropic's Safety Obsession Built a Shipping Machine. New Opus 4.6 Proves It.

On Tuesday, a single plugin wiped $300 billion off the stock market. Not a product launch. Not a platform. A plugin, released on a Friday afternoon by a company with fewer employees than a mid-size law firm.

Anthropic's Cowork legal tool, dropped without fanfare into an open-source repository, sent Thomson Reuters down 15.8% in a single session. LegalZoom lost a fifth of its value. The WisdomTree Cloud Computing Fund, a proxy for the entire SaaS sector, has now shed more than 20% this year. A Goldman Sachs basket of US software stocks posted its worst day since April's tariff-fueled selloff.

Three days later, on Thursday, Anthropic released Opus 4.6. The company's new flagship model beats OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA, an independent benchmark measuring performance on real-world knowledge work in finance, legal, and corporate domains. It scored 68.8% on ARC AGI 2, nearly double its predecessor's 37.6%. It found over 500 previously unknown zero-day vulnerabilities in open-source code during testing, with no specialized prompting.

If you work in enterprise software, financial services, or legal tech, this is the company you should be watching. Not OpenAI. Not Google. The one with 2,000 employees and a twice-monthly Dario Vision Quest.

The 2,000-person wrecking ball

Anthropic shipped more than 30 products and features in January alone. That count comes from Bloomberg Opinion columnist Parmy Olson, who noted something the benchmarks miss. OpenAI has double the headcount. Microsoft runs on 228,000 employees, Google on 183,000, and neither has shipped anything close. Yet Anthropic's tools for generating code and operating computers go beyond what any of them have managed to launch.

The Argument

• Anthropic's safety-obsessed culture eliminates internal friction, letting 2,000 employees outship Google, Microsoft, and OpenAI combined.

• Opus 4.6 beats GPT-5.2 by 144 Elo points on knowledge work and opens new fronts in finance, law, and cybersecurity.

• A single Cowork plugin triggered a $300 billion selloff in software stocks, proving the market now treats Anthropic as the most commercially dangerous AI company.

• The real test comes when Anthropic goes public. Mission-driven cultures tend to drift once Wall Street sets the priorities.

Claude Code hit $1 billion in annualized revenue last November, six months after going generally available. Forty-four percent of enterprises now run Anthropic models in production, up from near zero in early 2024, according to an Andreessen Horowitz survey from last month. Seventy-five percent of Anthropic's enterprise customers are already in production, not just testing.

The company is raising $10 billion at a $350 billion valuation. It may double that round to $20 billion because investor demand spiked too fast. Anthropic is also preparing a tender offer at that same valuation, giving employees liquidity without waiting for the IPO everyone expects.

No safety research lab posts numbers like these. Something else is going on. Anthropic has turned its mission into a weapon, and the weapon fires faster than anything in Silicon Valley.

The army that fights for a cause

Here is the part that confuses people. Anthropic's founding story is about caution. Dario Amodei and a group of OpenAI researchers left because they thought Sam Altman's operation was too casual about existential risk. They built a company organized around alignment research, constitutional AI, and lengthy system cards. Twice a month, Amodei gathers staff for what the company calls a Dario Vision Quest. He has the harried look of a mad scientist, according to Bloomberg's Parmy Olson, and he speaks at length about building trustworthy AI and the geopolitical consequences of getting it wrong.

And yet this company is the one rattling markets.

Boris Cherny built Claude Code. He also gets recognized at airports now, which tells you something about how far a coding tool can travel. "Ask anyone why they're here," he told Bloomberg recently. "Pull them aside and the reason they will tell you is to make AI safe."

The connection between safety obsession and shipping speed is not a coincidence. It is the mechanism. Sebastian Mallaby has written a forthcoming book on Google DeepMind. He put it in military terms when speaking to Bloomberg. "Military historians often argue that the sense of fighting for a noble cause drives armies to perform better." OpenAI, he added, suffers from "the arrogance of the front-runner."

Armies with a cause do not hold committee meetings. They do not debate product strategy. They march. Anthropic's mission-driven culture eliminates the internal friction that bogs down bureaucracies at Google and Microsoft. No turf wars when everyone believes they are saving civilization. No product debates when the engineering team shares a single doctrine.

Sam Altman's company, by contrast, looks anxious and overextended. OpenAI is chasing consumer subscriptions, advertising revenue in ChatGPT, a hardware device, an app store, a $50 billion fundraise from Middle Eastern sovereigns, and an IPO. Double Anthropic's headcount. Less focus.

Join 10,000+ AI professionals

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

When Anthropic aired a Super Bowl ad this week mocking OpenAI's decision to test ads inside ChatGPT, Altman fired back on X, calling the spot "funny" but "clearly dishonest." He accused Anthropic of wanting "to control what people do with AI" while serving "an expensive product to rich people." The defensiveness was telling.

OpenAI released its Codex desktop app on Monday. Anthropic dropped Opus 4.6 on Thursday. The gap between those two announcements tells you something about organizational metabolism. One company launched an app. The other launched a model, agent teams, a PowerPoint integration, adaptive thinking, context compaction, and a cybersecurity research initiative. In the same week.

But here is the tension you should sit with. Anthropic's safety researchers warned last May that AI could eliminate up to 50% of entry-level office jobs within five years. Employment for recent computer science graduates has already declined 8% since 2022, according to Oxford Economics. Anthropic's own tools are now the ones most likely to accelerate that trend. The safety mission, it turns out, does not extend to employment.

Opus 4.6 opens every front at once

The model itself tells the story. Opus 4.6 is not a narrow coding upgrade. It is Anthropic opening new fronts in finance, law, and cybersecurity simultaneously, the way an army emboldened by early victories pushes into territory it would not have touched a year ago.

Start with finance, because that is where the money is. The model scores 60.7% on Vals AI's Finance Agent benchmark, state-of-the-art among frontier models. It improved 23 percentage points over Claude Sonnet 4.5 on Anthropic's internal finance evaluation. Hebbia's CTO said creating financial PowerPoints "that used to take hours now takes minutes." Anthropic's own finance blog says the model can produce a commercial due diligence report, the kind that takes a senior analyst two to three weeks, as a first-pass deliverable. If you run a financial services firm, that sentence should make you nervous.

Legal reasoning scored even higher. Harvey reported Opus 4.6 hit 90.2% on BigLaw Bench, the highest of any Claude model, with 40% perfect scores. This matters because the legal tool is what triggered Tuesday's selloff. Now the model powering it just got measurably better.

Nobody paid much attention to the cybersecurity angle. They should have. Opus 4.6 dug up over 500 high-severity zero-day flaws in open-source libraries during pre-release testing, all on its own. Logan Graham runs Anthropic's frontier red team. He told Axios that Claude invented new bug-hunting techniques after traditional fuzzing failed. For a GhostScript vulnerability, the model turned to the project's Git commit history on its own. Then it checked whether the same flaw existed elsewhere in the codebase. "I wouldn't be surprised if this was one of, or the main way, in which open-source software moving forward was secured," Graham said.

Underneath all of this sits a technical upgrade that makes the rest possible. Opus 4.6 is the first Opus-class model with a one-million-token context window, matching what Gemini has offered for over a year. On MRCR v2, a retrieval benchmark that buries information in massive documents, Opus 4.6 scored 76%. Sonnet 4.5 scored 18.5%. One model can work with your full codebase. The other forgets what it read 50,000 tokens ago.

And Claude Code can now split work across multiple agents that coordinate autonomously. One handles the frontend. Another owns the API. A third runs the migration. AI that works like a team, not a tool. That is where Anthropic is heading.

Who loses from here

The market already answered this question on Tuesday. Thomson Reuters, RELX, Wolters Kluwer, LegalZoom. FactSet dropped 10% on Thursday alone after the Opus 4.6 announcement. European legal and data services firms posted their worst single-day performances in decades. The SaaS sector looks cornered, squeezed between AI labs that ship faster every quarter and customers who now wonder what they are paying for.

Jensen Huang called the panic "illogical." JPMorgan's Mark Murphy, who covers US enterprise software, told Reuters it "feels like an illogical leap" to say a single LLM plugin could replace mission-critical software. They may be right about the timeline. They are wrong about the direction.

Scott White runs Anthropic's enterprise product. He framed it diplomatically. "We are excited to partner and actually lower the floor to get more value out of those tools." But the market heard what he did not say. If Claude can produce a due diligence report in minutes and a legal memo on the first pass, the value proposition of the software that currently mediates that work gets thinner every quarter.

Eighty percent of Anthropic's business comes from enterprise customers. The a16z data shows average enterprise LLM spend hit $7 million in 2025, up 180% from the year before, with projections at $11.6 million for this year. That money is coming from somewhere. Mostly from budgets that used to go to SaaS licenses.

EMarketer's Jacob Bourne told CNN that "panic over this is probably misplaced" in the short term. Security concerns alone will keep many large companies from handing Claude access to their files. Fair enough. But Bourne added the quiet part. "Legacy enterprise software providers are going to need to continue evolving." That is analyst-speak for "your moat is shrinking."
White, asked about Cowork's relationship to existing software, called it "the front door to getting hard work done." Front doors have a way of replacing the rooms behind them.

The culture question nobody wants to ask

Anthropic is now raising at a $350 billion valuation. At that price, investors are not buying safety research. They are buying growth, disruption, and the assumption that Claude will keep eating market share from every direction.

Parmy Olson noted the historical pattern. Google had "don't be evil." OpenAI started as a nonprofit that existed to "benefit humanity." We know how those stories ended. Anthropic's safety culture has held so far, in part because it doubles as a competitive advantage. But $350 billion creates its own gravitational pull.

Earlier this month, Amodei dropped a 20,000-word essay about the civilizational risks of AI. His company released the plugin that vaporized $300 billion in market cap the same week. Same building. Same investors writing the checks. Nobody inside Anthropic seems to find this strange, which is maybe the strangest part.

Watch what happens when the IPO window opens. That is the real test. Not whether Anthropic can ship. It has already proven that beyond anyone's expectations. The question is whether the safety mission survives contact with a public market that rewards exactly the kind of disruption Amodei spends his Vision Quests warning about. Armies that fight for a cause move faster. But armies that win tend to forget what they were fighting for.

On Tuesday, a plugin erased $300 billion. On Thursday, the model behind it got stronger. Somewhere in San Francisco, Dario Amodei is probably already drafting his next 20,000-word essay about why this should worry us. He is not wrong. But he is also not slowing down.

Frequently Asked Questions

Q: How big is Anthropic compared to OpenAI, Google, and Microsoft?

A: Anthropic has roughly 2,000 employees. OpenAI has about double that. Microsoft employs 228,000 and Google 183,000. Despite the size gap, Anthropic shipped over 30 products and features in January 2026 alone, and Claude Code hit $1 billion in annualized revenue within six months of launch.

Q: What benchmarks does Opus 4.6 lead on?

A: Opus 4.6 tops Terminal-Bench 2.0 for agentic coding, Humanity's Last Exam for multidisciplinary reasoning, and BrowseComp for information retrieval. It beats OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA, which measures real-world knowledge work in finance and legal domains. It scored 68.8% on ARC AGI 2, nearly double its predecessor.

Q: What caused the $300 billion stock selloff linked to Anthropic?

A: Anthropic released open-source plugins for its Cowork AI agent on a Friday, enabling automated tasks across legal, sales, and data analysis. Investors feared the tools could replace specialized enterprise software. Thomson Reuters fell 15.8%, LegalZoom dropped nearly 20%, and a Goldman Sachs basket of software stocks had its worst day since April.

Q: What are agent teams in Claude Code?

A: Agent teams let Claude Code split a project across multiple AI agents that work in parallel and coordinate autonomously. One agent might handle the frontend, another the API, and a third the database migration. The feature launched as a research preview alongside Opus 4.6.

Q: How did Opus 4.6 find 500 zero-day vulnerabilities?

A: During pre-release testing, Anthropic's frontier red team gave Opus 4.6 access to Python and standard vulnerability analysis tools but no specific instructions. The model found over 500 previously unknown high-severity flaws in open-source libraries, inventing new bug-hunting techniques when traditional fuzzing failed. Each flaw was validated by Anthropic staff or external researchers.