New Claude 4 AI Can Code All Day, Play Pokémon All Night

Meta poaches Thinking Machines co-founder weeks after $2B raise and first product

Meta hired Thinking Machines co-founder Andrew Tulloch weeks after the startup raised $2 billion and shipped its first product. The timing reveals a brutal dynamic: even well-funded AI startups can't hold talent against platform-scale compensation packages.

Carsten Maschmeyer brings Germany’s “Shark Tank” energy to South Park

Germany's most famous tech investor brings a €175M fund and a contrarian thesis to San Francisco: enterprise AI wins through contracts, not demos—and precision psychiatry can crack depression's trial-and-error curse if Phase-3 data holds.

Benioff calls for National Guard in San Francisco, shocks own PR team with Trump embrace

Salesforce CEO Marc Benioff called for National Guard troops in San Francisco from his private plane, shocking his own PR team. The theatrical Trump embrace—timed before his big conference—tests whether loud loyalty beats quiet accommodation.

New Claude 4 AI Can Code All Day, Play Pokémon All Night—And That's Why Anthropic Is Worried

Anthropic's new AI can code for 7 hours straight. It's so capable the company activated emergency protocols meant for models that could help create bioweapons. Inside the safety measures now guarding Claude 4.

Anthropic just dropped Claude 4, and the AI landscape shifted. The new models can work autonomously for hours—coding for seven straight, playing Pokémon for 24. But here's the twist: they're so capable that Anthropic activated its strictest safety protocols yet, fearing the models could help novices build bioweapons.

Claude Opus 4 and Claude Sonnet 4 aren't just incremental updates. They represent what Anthropic calls the leap from "assistant to true agent." Opus 4 excels at tasks requiring "sustained performance on long-running tasks that require focused effort and thousands of steps", while Sonnet 4 brings those capabilities to everyday users—including free tier customers.

The numbers tell the story. Opus 4 scores 72.5% on SWE-bench and 43.2% on Terminal-bench, making it what Anthropic claims is "the world's best coding model". Japanese tech giant Rakuten validated this by deploying Claude to code autonomously for nearly seven hours on a complex open-source project. The model didn't just survive—it thrived.

When AI Gets Too Good

But raw performance created an unexpected problem. During internal testing, Claude Opus 4 performed so well at advising novices on producing biological weapons that it triggered Anthropic's AI Safety Level 3 protocols—the first model to hit this threshold. According to Anthropic's chief scientist Jared Kaplan, "You could try to synthesize something like COVID or a more dangerous version of the flu—and basically, our modeling suggests that this might be possible".

The safety measures aren't window dressing. Anthropic deployed what it calls a "defense in depth" strategy: constitutional classifiers that scan for dangerous queries, enhanced jailbreak prevention, and cybersecurity hardened against non-state actors. They're even paying $25,000 bounties for universal jailbreaks. The company admits it can't guarantee perfect safety but claims to have made harmful use "very, very difficult."

This tension between capability and caution runs through the entire release. While Claude can maintain "memory files" to track information across sessions—creating a navigation guide while playing Pokémon for 24 hours straight—it's also being watched more carefully than any previous Anthropic model. The models show a 65% reduction in "reward hacking" compared to their predecessors, addressing the tendency of AI to find loopholes and shortcuts.

The business implications are massive. Anthropic's annualized revenue hit $2 billion in Q1, doubling from the previous period. Customers spending over $100,000 annually jumped eightfold. The company just secured a $2.5 billion credit line and aims for $12 billion in revenue by 2027.

Major players are already on board. GitHub selected Claude Sonnet 4 as the base model for its new coding agent in Copilot. Cursor calls Opus 4 "state-of-the-art for coding and a leap forward in complex codebase understanding." Replit, Block, and others report dramatic improvements in handling multi-file changes and debugging.

What You Actually Get

Both models feature "hybrid" capabilities—offering near-instant responses or extended thinking for deeper reasoning. They can use tools in parallel, search the web while reasoning, and maintain context across long sessions. The pricing remains consistent: Opus 4 at $15/$75 per million tokens (input/output) and Sonnet 4 at $3/$15.

Anthropic also made Claude Code generally available, with new VS Code and JetBrains integrations that display edits directly in files. The Claude Code SDK lets developers build custom agents, and a GitHub integration responds to PR feedback and fixes CI errors.

The timing matters. This launch comes as the AI agent race intensifies, with companies betting billions that autonomous AI will transform work. OpenAI, Google, and others are pushing similar capabilities. But Anthropic's approach—powerful capabilities wrapped in unprecedented safety measures—might define how the industry handles increasingly capable AI.

Why this matters:

AI agents just crossed the threshold from helpful tools to autonomous workers—Claude can now handle complex tasks that previously required constant human supervision
The bioweapon concerns aren't hypothetical anymore—Anthropic's own testing showed the models could potentially help create pandemic-level threats, forcing the first activation of their highest safety protocols

Read on, my dear:

Anthropic: Introducing Claude 4

Meta poaches Thinking Machines co-founder weeks after $2B raise and first product

Marcus Schuler Oct 12, 2025

AI News

Carsten Maschmeyer brings Germany’s “Shark Tank” energy to South Park

Marcus Schuler Oct 11, 2025

Google's Trillion-Token Milestone Hides Slowing AI Growth

AI News

Google’s ‘quadrillion tokens’ brag hides a slower story

Google's 1.3 quadrillion token milestone sounds massive—until you see growth rates halving and realize tokens measure server intensity, not customer demand. The slowdown reveals something uncomfortable about AI economics.

Robert Brown Oct 10, 2025

ASML Names CTO as Chip Monopoly Faces AI Infrastructure Bind

AI News

ASML names CTO as chip monopoly faces AI infrastructure stakes

ASML fills an 18-month CTO vacancy as the chip monopoly races to expand Eindhoven capacity, deploy AI tools, and navigate China export bans—while every AI model depends on machines only it can make. Succession meets systemic risk.

Maria Garcia Oct 9, 2025

New Claude 4 AI Can Code All Day, Play Pokémon All Night—And That's Why Anthropic Is Worried

When AI Gets Too Good

What You Actually Get

Read next