RSS GitHub
The Ledger A sourced historical record of AI

AlphaProof Published in Nature — Olympiad-Level Formal Math via RL

A ledger entry in the research archive, dated 2025-11-12.

Summary

Google DeepMind published the full technical description of AlphaProof in Nature (DOI 10.1038/s41586-025-09833-y), formalizing the system's silver-medal performance at IMO 2024 (28/42 points, solving 3 of 5 non-geometry problems). The paper detailed how AlphaProof used reinforcement learning against the Lean 4 theorem prover — generating millions of related problem variants to build up a self-play training curriculum — and why the approach produced proofs that were machine-verified and therefore logically guaranteed correct.

What Happened

AlphaProof's IMO 2024 result had been announced informally in July 2024; the Nature publication in November 2025 constituted the peer-reviewed scientific record with full architectural details.

The system operated in a loop. A language model (fine-tuned on formal mathematics) generated candidate Lean 4 proof steps. Lean's type checker acted as a perfect verifier, immediately accepting correct steps and rejecting incorrect ones — a reward signal with zero false positives. AlphaProof used this signal in a reinforcement learning loop, but the key innovation was the training distribution: rather than training only on competition problems (which number in the hundreds), the system generated millions of variants by systematically modifying known problems — changing constants, swapping lemmas, altering constraints — and used RL to solve these, building up a diverse library of formal proof experience.

At IMO 2024, AlphaProof was given problems translated into Lean by a human formalization team (the geometry problem required specialist formalization that the automated pipeline couldn't handle, which is why it was excluded). For the three problems it solved, AlphaProof's search ran for up to the maximum allowed time and returned machine-verified Lean proofs.

The silver medal score of 28/42 placed AlphaProof above roughly the top 50% of human competitors at the most selective mathematics competition in the world.

Why It Matters

The Nature publication made the technical contribution precise in ways that informal announcements had not. The RL-against-formal-verifier paradigm resolved a long-standing chicken-and-egg problem in automated theorem proving: you need interesting theorems to train on, but generating interesting theorems requires reasoning capability you don't yet have. AlphaProof's variant-generation approach created a self-bootstrapping curriculum that required only a modest seed of formally stated problems.

The guaranteed correctness property mattered at a level beyond benchmarks. A prose-proof AI can produce confident-sounding but subtly flawed arguments — mathematicians have noted this failure mode in all natural-language math generation systems. AlphaProof's Lean proofs were either correct or they didn't compile; there was no ambiguous middle ground. For a future in which AI assists with research-level mathematics, the ability to produce machine-checkable proofs is arguably more valuable than raw performance on competition problems. The companion Gemini Deep Think gold (July 2025, natural language) showed the two paradigms reaching comparable competition performance from opposite architectural directions.

§ How to read the metadata
Landmark
Fundamentally alters the trajectory; 2–5 per year.
Major
Meaningfully shifts the landscape; 2–4 per month.
Notable
Worth documenting; significance can be upgraded later.
Confidence
High = primary sources corroborate. Medium = credible secondary only. Low = provisional. Disputed = credible sources disagree.
Contestation
Uncontested = no formal challenge. Contested = at least one challenge open. Superseded = replaced by a later entry. Unresolved = dispute still open.

References

  1. Solving olympiad geometry without human demonstrations , Nature (Wed Nov 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time)) primary document
  2. AI solves IMO problems at silver-medal level , Google DeepMind (Wed Nov 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time)) official
  3. AI earns silver at the Mathematics Olympiad — what this means for maths , Nature News & Views (Wed Nov 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time)) expert interpretation

See also