SorryDB: Can AI Provers Complete Real-World Lean Theorems?

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the gap between current AI theorem provers’ strong performance on static benchmarks and their limited effectiveness in real-world formal mathematics projects, which involve dynamic dependencies and complex requirements. To bridge this gap, we introduce SorryDB—the first dynamic, continuously updated benchmark derived from 78 real-world Lean projects on GitHub, featuring open-ended proof tasks that mitigate test-set contamination and better reflect mathematicians’ actual workflows. We evaluate a range of approaches—including general-purpose large language models, agent-based methods (e.g., Gemini Flash), specialized symbolic provers, and human-curated strategies—on 1,000 task snapshots. Our results show that agent-based methods achieve the best overall performance, yet they do not universally surpass other paradigms, highlighting both the complementary strengths and inherent limitations of current techniques.

Technology Category

Application Category

📝 Abstract
We present SorryDB, a dynamically-updating benchmark of open Lean tasks drawn from 78 real world formalization projects on GitHub. Unlike existing static benchmarks, often composed of competition problems, hillclimbing the SorryDB benchmark will yield tools that are aligned to the community needs, more usable by mathematicians, and more capable of understanding complex dependencies. Moreover, by providing a continuously updated stream of tasks, SorryDB mitigates test-set contamination and offers a robust metric for an agent's ability to contribute to novel formal mathematics projects. We evaluate a collection of approaches, including generalist large language models, agentic approaches, and specialized symbolic provers, over a selected snapshot of 1000 tasks from SorryDB. We show that current approaches are complementary: even though an agentic approach based on Gemini Flash is the most performant, it is not strictly better than other off-the-shelf large-language models, specialized provers, or even a curated list of Lean tactics.
Problem

Research questions and friction points this paper is trying to address.

AI provers
Lean theorems
formal mathematics
real-world formalization
theorem proving
Innovation

Methods, ideas, or system contributions that make the work stand out.

SorryDB
dynamic benchmark
formal theorem proving
Lean
AI provers
🔎 Similar Papers
No similar papers found.
A
Austin Letson
Axiomatic AI
Leopoldo Sarra
Leopoldo Sarra
Axiomatic AI
foundation models for scienceself-supervised learningai4sciencestatistical physics
A
Auguste Poiroux
Math, Inc.
O
Oliver Dressler
P
Paul Lezeau
Imperial College
D
Dhyan Aranha
University of Amsterdam
F
Frederick Pu
University of Toronto
A
Aaron Hill
M
Miguel Corredera Hidalgo
ENSEIRB-MATMECA, INP-Bordeaux
J
Julian Berman
Columbia University
George Tsoukalas
George Tsoukalas
PhD Student, UT Austin
Automated Theorem ProvingMathematical ReasoningProgram SynthesisNeurosymbolic Programming
L
Lenny Taelman
University of Amsterdam