Mathematicians say AI cracked research proofs. The best tools aren't public

Ravi Vakil had a question he couldn't quite answer. The Stanford algebraic geometer and current president of the American Mathematical Society had just finished proving a theorem about how spheres embed in flag varieties. The proof worked. It was, he told Quanta Magazine, "very elegant, correct, beautifully written." The authors could follow it line by line.

One problem. Vakil wasn't sure he'd written it.

Some of the proof came out of a private DeepMind system called FullProof, built on top of Google Gemini and not released to anyone outside the collaboration. "Who is that idea due to?" Vakil asked. "Is it due to us? Is it due to the model?" He thought about it. He said he probably would have gotten there eventually. Then he hesitated. "I think so. I'm not sure. I don't know."

That hesitation is the whole story. You can read it as a philosophical puzzle about authorship. You should read it as a distribution problem. The tools that helped Vakil finish his paper are not the tools the rest of the academy can use. And the gap between those two sets of tools just got priced.

The Breakdown

AI models now prove genuine research-level theorems, but the most capable systems (AlphaEvolve, FullProof, OpenAI internal) are private and unreproducible.
Harmonic closed $120M Series C at $1.45B valuation in November 2025, pricing Lean-based formal verification as a compliance moat.
Mathematicians are leaving academia for labs: Ernest Ryu (UCLA to OpenAI), Ken Ono (Virginia to Axiom), Charton to Axiom.
The public tools (ChatGPT, Claude, Gemini) are breaking homework and drowning journals in AI slop while private tools do the serious work.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

What actually happened in twelve months

Start with the numbers, because the numbers are why everyone is nervous.

In July 2025, both Google DeepMind and OpenAI announced that their models had scored 35 out of 42 points at the International Mathematical Olympiad, solving five of six problems under contest conditions. DeepMind said this was the first time a general-purpose reasoning system, working end-to-end in natural language, reached gold-medal standard. The 2024 silver required AlphaProof plus AlphaGeometry 2 plus specialized formal tooling. The 2025 gold required prompts.

Then came the research math. In November, Terence Tao and three collaborators posted a paper reporting that DeepMind's AlphaEvolve had been run against 67 open problems across combinatorics, number theory, and geometry. On 23 of them, it improved the best known solution. On 36, it matched the state of the art. The authors were careful about what this meant. AlphaEvolve, they wrote, "cannot produce deep new mathematical insights." It is a scale tool. What it scales is the hours a mathematician would otherwise spend searching.

Ernest Ryu, who works in optimization theory at UCLA, used ChatGPT over about twelve hours spread across three days to prove a conjecture that Yurii Nesterov posed in 1983 and nobody had resolved in forty-two years. A few months later, Ryu took a leave of absence from UCLA and joined OpenAI as a member of the technical staff.

In February, a benchmark called First Proof released ten research-level questions whose answers the authors had kept out of any training corpus. AI systems solved more than half. OpenAI's internal model, according to the company, probably got at least five correct: problems 4, 5, 6, 9, and 10.

That is the capability shift. The capture is the other story.

The useful tools are not the ones you can use

Here is what mathematicians outside the labs can actually touch. They can chat with ChatGPT, Claude, or the public version of Gemini. They can use DeepThink, the reasoning system DeepMind released for paid Gemini customers. If they have a Lean setup, they can run Harmonic's Aristotle through whatever access tier Harmonic grants them.

Here is what they cannot touch. They cannot run AlphaEvolve, which is how Tao and Gómez-Serrano ran 67 problems through a genetic search in a few months. They cannot run FullProof, which is how Vakil and his co-authors got the elegant proof of the base case. They cannot run the internal OpenAI model that apparently solved five First Proof problems, because OpenAI has not released it. The strong Prime Number Theorem formalization that Math Inc announced in March, comprising about 25,000 lines of Lean and a thousand-plus theorems, was produced by an agent called Gauss. Gauss is not a download. It is a company.

This is a new shape for mathematics. Results of the old kind, proved by humans on blackboards and refereed by other humans, were reproducible by anyone with the same blackboard. Results of the new kind are reproducible only by whoever holds the weights. The Vakil paper is a theorem in the literature, but the proof behind the proof is behind a private API. If a skeptic wanted to re-derive it from scratch, they would need an account that DeepMind has not offered for sale.

You can see why mathematicians feel cornered. The tools that move the work fastest are exactly the tools the discipline cannot inspect.

Get Implicator.ai in your inbox

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

The talent is already gone

The second thing the last year has done is move the people.

Ryu, an optimization theorist with a good UCLA job, now works at OpenAI. Ken Ono, a number theorist at Virginia, is on leave to serve as the "founding mathematician" at Axiom Math. Tao himself has become a standing collaborator with DeepMind's math team, co-authoring the November AlphaEvolve paper. François Charton, who started pushing AI at mathematical problems in 2019, is now at Axiom. The pattern is old for machine learning. It is new for pure mathematics. Pure mathematicians, as a rule, do not leave.

They are leaving now because the labs are suddenly the only place where you can use the tools on the work. A mathematician who wants to run AlphaEvolve against 67 problems in a long weekend cannot do that from a faculty office. They can do it from Mountain View. The institutional gravity is inverting. Akshay Venkatesh, the Institute for Advanced Study Fields Medalist, told Quanta that AI "risks causing mathematicians to lose direct experience with mathematical understanding." He is worried about culture. The labs are hiring.

There is a defensive emotion inside the academy right now. You can hear it in the IAS faculty lounges and you can hear it in the First Proof editorial board's second batch announcement, which tightened the rules so that private, bespoke models cannot enter. "AI systems will only have one shot to answer each question," the board wrote. Referee reports will be public. Only systems available by API will be eligible. Translation: we are drawing a line around what counts as a result, and we are drawing it before the next demo lands.

Harmonic priced the moat

The capture has a number on it now.

In November 2025, Harmonic closed a $120 million Series C at a $1.45 billion post-money valuation, led by Ribbit Capital with Sequoia, Index, Kleiner Perkins, and Emerson Collective in the round. That is fourteen months after the company's $75 million Series A, a line of finance that has roughly doubled the valuation on every tick. Harmonic's pitch is not that it will outwrite Gemini or out-reason ChatGPT. Its pitch is that its flagship Aristotle model runs its outputs through Lean 4, the formal theorem prover. Formal verification, Harmonic argues, is the thing that makes mathematical AI trustworthy. An output that Lean accepts cannot be wrong in the way an LLM is usually wrong. It may be useless. It cannot be a fabrication.

You should think of the $1.45 billion as the market's first serious estimate of what formal verification is worth as a moat. Not as a proof of concept. As a toll booth. The argument is compliance-shaped. If you are a pharmaceutical company that needs an optimization result to be provably correct, or a cryptographer who needs a protocol that is verifiably secure, or an auditor who has been told by a regulator that AI outputs have to be checkable, Lean is the only story that currently holds. And Harmonic has spent three funding rounds making itself the company you call.

This is also why Math Inc's Gauss announcement matters beyond the Prime Number Theorem. Math Inc said it wants to grow the total body of formal code by two to three orders of magnitude in twelve months. That sentence is a land grab. Whoever owns the corpus of formally verified mathematics owns the input that the next generation of math agents will train on. Terence Tao told Quanta: "For the first time, it does feel like we could formalize a significant fraction of mathematics through AI." The first to do it gets the corpus. The rest get the license.

The slop is a feature, not a bug

While the useful work is going private, the unusable work is washing into the public commons, and the commons is the academy.

Joel David Hamkins at Notre Dame told Quanta he is "despairing of this ocean of slop that is overwhelming our journal systems." He stopped assigning homework. "A substantial fraction of the assignments students turn in are written by AI. I don't want to read it. I don't want to be the AI cop." Daniel Litt wrote on his blog that the new technology is "very likely bigger than the computer," a line other mathematicians keep quoting, but Litt also warned about the pollution.

Think about the structure here. The high-value work goes to wherever the best tools are, which is inside the labs. The training work goes to wherever the worst tools are, which is inside the classroom. Private systems produce proofs that ship in Annals papers. Public systems produce the homework that broke Hamkins. The pedagogy that made new research mathematicians possible is breaking at exactly the moment the research is accelerating. Ono, who is cheerful about research, told Quanta he is "deeply concerned about the role of AI in the future of work and training at all levels."

Something is going to give. Probably the pipeline.

Who wins, who loses, what changes

Harmonic wins if its bet on Lean-as-compliance holds. Math Inc wins if its corpus strategy works and it becomes the default formalization vendor. OpenAI and DeepMind win on recruiting, because they already are. Axiom wins if its $200 million reported raise (a figure that is currently single-sourced to an Ulam AI YouTube video and worth treating with caution) is real, and if it can keep hiring names like Ono and Charton.

The losers are obvious and quiet. Graduate programs in pure math, which depend on homework and apprenticeship, are already breaking. Mid-career researchers at places that cannot offer API access to the private tools will drift into a two-tier system where the papers they can write are structurally smaller than the papers the lab-affiliated can write. Venkatesh's cultural worry is real. So is its economic shape.

Watch three things. First, the First Proof second batch results in June 2026, which will be the first public head-to-head between Gemini 3.1 DeepThink and OpenAI's GPT 5.4 Pro on uncontaminated research problems, under one-shot API rules. That benchmark is designed to be a fair test. It will also be the moment the labs either defend their internal demos or concede they were inflated. Second, whether Harmonic's Aristotle access stays developer-gated or moves upstream into an enterprise compliance play, because that is the tell on whether Lean-as-moat is a business or a prestige object. Third, how many more Ernest Ryus quietly take leave from their tenured jobs before the summer.

Vakil's question was whose idea the proof was. The answer the market has started giving is that the idea belongs to whoever owns the system that produced it, and the system costs somewhere north of a billion dollars to build. The mathematician still gets to be an author. The lab gets to be a landlord.

That is the pricing. It is not obvious that the academy can afford the rent.

Frequently Asked Questions

What changed in research mathematics during 2025?

Both Google DeepMind and OpenAI scored 35/42 at the 2025 International Mathematical Olympiad working end-to-end in natural language. Months later, AlphaEvolve improved best-known solutions on 23 of 67 research problems. A benchmark called First Proof saw AI systems solve more than half of ten uncontaminated research-level questions in February 2026.

Why does Harmonic's $1.45 billion valuation matter?

Harmonic's Aristotle model runs outputs through Lean 4, the formal theorem prover. The market priced that formal verification layer at $1.45 billion because Lean-verified outputs are structurally unable to fabricate results, which makes Harmonic the default call for any regulated industry that needs mathematical AI outputs to be checkable.

Which tools can mathematicians outside the labs actually use?

Public versions of ChatGPT, Claude, and Gemini, plus the paid DeepThink reasoning system. Not AlphaEvolve, not DeepMind FullProof, not OpenAI's internal First Proof model, and not Math Inc's Gauss agent. All of those remain private, which means the results they produce cannot be independently reproduced by anyone without lab credentials.

Why are mathematicians leaving academia?

Because the labs are now the only place where you can run the best tools against actual research problems. Ernest Ryu took leave from UCLA to join OpenAI after using ChatGPT to prove a 1983 Nesterov conjecture. Ken Ono took leave from Virginia to become Axiom Math's founding mathematician. François Charton is also at Axiom.

What should readers watch next?

The First Proof second batch results in June 2026, which will pit Gemini 3.1 DeepThink against OpenAI's GPT 5.4 Pro on uncontaminated problems under one-shot API rules. Also watch whether Harmonic's Aristotle moves from developer-gated access into enterprise compliance sales, and whether more tenured mathematicians take quiet leaves of absence.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Analysis

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

Mathematicians say AI cracked research proofs. The best tools aren't public.

What actually happened in twelve months

The useful tools are not the ones you can use

The talent is already gone

Harmonic priced the moat

The slop is a feature, not a bug

Who wins, who loses, what changes

Marcus Schuler

Get the Morning Briefing in your inbox.

Related Stories

In the AI Hardware Boom, the Money Lands Upstream With Nvidia and Memory

Anthropic's $965 Billion Title Rests on a Model Built to Flag Its Own Mistakes

Coding Agents Gave OpenAI and Anthropic the Business Model ChatGPT Couldn't