Good Morning from San Francisco,

Apple researchers just shattered AI's reasoning hype. Those fancy "thinking" models from OpenAI and Anthropic? They break down completely when puzzles get complex.

The twist stings. These models think less when problems get harder. Claude burns 20,000 reasoning tokens on simple puzzles, then drops to 5,000 on tough ones. It quits precisely when persistence matters most.

Researchers fed models exact solution algorithms. The AIs still failed at identical complexity walls. They can't execute their own step-by-step instructions.

Math benchmarks mislead everyone. Models score worse on 2025 tests than 2024 versions—despite humans finding them easier. Pure memorization masquerading as reasoning.

Bottom line: Current AI reasoning hits hard limits. More compute won't save it.

Stay curious,

Marcus Schuler

Apple study: Smart AI models get dumber when they need to think harder

Apple researchers discovered something troubling about AI's latest "thinking" models. These systems—like OpenAI's o3 and Claude's reasoning variants—collapse completely when problems cross certain complexity thresholds.

The study tested models on controlled puzzles instead of traditional math benchmarks. Unlike established tests that suffer from data contamination, these puzzle environments let researchers manipulate complexity precisely while tracking each step of the AI's reasoning process.

The results reveal three distinct zones. Simple problems favor standard language models over their "thinking" counterparts. The extra reasoning steps waste computational resources without improving results. For moderate complexity, thinking models gain an edge by working through problems step-by-step. But at high complexity, both model types fail completely.

The algorithm paradox

Even when researchers provided complete solution algorithms, models still failed at identical complexity thresholds. This suggests the problem isn't finding solutions but executing logical steps consistently.

Consider the Tower of Hanoi puzzle. When given the exact recursive algorithm to solve it, models performed no better than when working from scratch. They couldn't follow their own prescribed steps.

Reasoning effort drops when needed most

The most counterintuitive finding: models reduce their reasoning tokens as problems become more complex, exactly when more thinking should help. This happens well before hitting computational limits.

For simple Tower of Hanoi puzzles, Claude 3.7 Thinking might use 20,000 tokens. But for harder versions requiring more moves, it drops to 5,000 tokens despite having 64,000 available. The model gives up rather than persisting through difficulty.

Traditional AI benchmarks mislead because models memorized training data during training. They perform worse on AIME 2025 than 2024 despite humans finding it easier.

Why this matters:

AI reasoning models hit hard limits that more compute can't solve—they think less when problems get complex
These failures happen even when given explicit solution steps, revealing gaps in logical execution rather than problem-solving creativity

Read on, my dear:

implicator.ai: New Research: The Harder the Problem, the Dumber the Model

AI Image of the Day

Prompt:

A woman with the head of an Asian white crane, the woman is facing the camera and the crane head is covering half of her face following the woman head shape, black background, surreal photography by Tim Walker and Nona Limmen, fine art portrait, hyper-realistic details, studio lighting, soft shadows, low contrast, clean sharp focus, cinematic colour grading

💰Meta’s $10B Reality Check: AI Needs Better Data

Meta wants to write a very large check to Scale AI. The social media company is discussing an investment that could exceed $10 billion in the artificial intelligence startup.

This would mark Meta's biggest external AI bet ever. The company usually builds AI tools in-house rather than buying them from others. But CEO Mark Zuckerberg has made AI his top priority, promising to spend up to $65 billion on AI projects this year.

Scale AI does the unglamorous but essential work of preparing data for AI training. The company employs thousands of contract workers who label images, clean up text, and organize information that AI models need to learn. Without good data, AI systems fail.

The money chase heats up

Meta's rivals have already opened their wallets. Microsoft has pumped over $13 billion into OpenAI. Amazon and Google have each invested billions in Anthropic. Meta doesn't want to fall behind.

Scale AI has become a crucial player in the AI boom. The startup generated $870 million in revenue last year and expects to double that to $2 billion in 2025. It was valued at $14 billion in 2024, though recent talks suggested a $25 billion price tag.

Both companies also share an interest in military applications. They're already working together on Defense Llama, an AI system designed for the Pentagon.

Why this matters:

Meta is abandoning its go-it-alone AI strategy, admitting it needs outside help to compete
The deal shows how valuable boring data work has become in the AI gold rush

Read on, my dear:

Bloomberg: Meta in Talks for Scale AI Investment That Could Top $10 Billion

AI & Tech News

Duolingo CEO Says AI Won't Replace Workers Despite Going "AI-First"

Duolingo CEO Luis von Ahn faced fierce backlash after announcing the company would go "AI-first," with users threatening to cancel subscriptions over fears of mass layoffs. Von Ahn now says AI will help workers focus on creative tasks rather than replace them, though he admits a "very small number" of contractors doing repetitive work will lose their jobs.

Nvidia Boss Says UK Needs Better AI Infrastructure

Nvidia's Jensen Huang told the UK it has everything needed for AI success except the infrastructure to support it. Prime Minister Keir Starmer responded by announcing £1bn in new funding to boost Britain's computing power twentyfold, declaring the country can be "an AI maker, not an AI taker."

Microsoft's Gaming Handheld Arrives This Holiday Season

Microsoft and Asus just announced the Xbox Ally X, their answer to Valve's Steam Deck dominance. The device runs a stripped-down Windows with Xbox's game library front and center, promising to fix the clunky software that plagues most PC handhelds.

Shein Plans India Export Push Within a Year

Shein and partner Reliance plan to expand their Indian supplier base from 150 to 1,000 factories within a year and start selling India-made clothes globally. The move helps Shein dodge U.S. tariffs on Chinese imports while Reliance gets access to the retailer's manufacturing methods and global reach.

YouTube Quietly Loosens Content Rules

YouTube told content moderators to leave up more questionable videos as long as they discuss politics or social issues. The platform now allows half a video to break its rules instead of just a quarter, marking another retreat from content policing after Trump's return and Meta's similar moves.

Guardian Launches Secret Messaging Tool for Sources

The Guardian launched Secure Messaging, a tool that conceals whistleblower communications by making them look like regular app traffic from millions of users. The platform built the technology with Cambridge University and released the source code for other news organizations to use as threats against journalists and sources escalate worldwide.

Developer Chronicles AI's Complete Code Takeover

A Cloudflare engineer documented how Claude AI wrote nearly all the code for a production OAuth library, preserving every prompt in git commits. The process revealed AI excels at generating functional code but still needs human guidance for strategy, bug fixes, and finishing touches—though the collaboration produced working software in just two months.

🚀 AI Profiles: The Companies Defining Tomorrow

Scale AI: Data Labeling Giant Powers AI Revolution

Scale AI transforms raw data into machine-ready fuel for artificial intelligence. The San Francisco startup has become the invisible force behind self-driving cars, chatbots, and military AI systems.

1. The Founders Founded 2016 by MIT dropout Alexandr Wang (19) and Carnegie Mellon grad Lucy Guo in San Francisco. Wang became the world's youngest billionaire at 25 after 2022 funding round. Guo left in 2018 over product disagreements. Company now employs ~2,000 people globally, built on vision of "API for human labor" to feed AI's hunger for labeled training data.

2. The Product Core strength: Converting messy data into AI-ready datasets through hybrid human-machine workflows. Platform handles everything from autonomous vehicle sensor data to chatbot training. Serves enterprise clients needing massive scale and precision - think millions of driving footage frames or government surveillance imagery. Recently expanded into AI model evaluation and safety testing. 🎯

3. The Competition Battles Labelbox (software-focused), legacy player Appen (struggling), and cloud giants' built-in tools like AWS SageMaker. Differentiates through enterprise-grade quality and government contracts requiring security clearance. Chinese tech giants build internal teams. Scale's moat: reputation for handling critical, sensitive projects where accuracy matters most.

4. Financing Raised $1.6B total, valued at $14B after 2024's $1B Series F. Backers include Accel, Meta, Amazon, Nvidia. Reportedly seeking $25B valuation in 2025 secondary sales. Profitable since 2020 - rare for hypergrowth startup. IPO speculation swirls as company matures.

5. The Future ⭐⭐⭐⭐ Riding AI boom perfectly positioned as "data refinery" for every industry adopting AI. Expanding beyond labeling into model evaluation and safety - critical as governments demand AI auditing. Risk: AI might eventually need less human-labeled data, but Scale's betting complex, high-stakes applications will always need human judgment.

Five Replace Fifty. Chips Stay Home.

China cuts data-center power bills; Trump blocks Nvidia exports

Coca-Cola debuts AI holiday ads, animals replace actors after 2024 backlash

🧠 AI Thinks Less When It Should Think More

Apple study: Smart AI models get dumber when they need to think harder

The algorithm paradox

Reasoning effort drops when needed most

Why this matters:

AI Image of the Day

💰Meta’s $10B Reality Check: AI Needs Better Data

The money chase heats up

Why this matters:

AI & Tech News

Duolingo CEO Says AI Won't Replace Workers Despite Going "AI-First"

Nvidia Boss Says UK Needs Better AI Infrastructure

Microsoft's Gaming Handheld Arrives This Holiday Season

Shein Plans India Export Push Within a Year

YouTube Quietly Loosens Content Rules

Guardian Launches Secret Messaging Tool for Sources

Developer Chronicles AI's Complete Code Takeover

🚀 AI Profiles: The Companies Defining Tomorrow

Scale AI: Data Labeling Giant Powers AI Revolution

Marcus Schuler

Read next