🤖 AI Summary
This work addresses the limited reliability of large language models (LLMs) in solving open mathematical conjectures by proposing a generate-and-verify closed-loop framework that alternates between an LLM and the Lean theorem prover to construct an agent capable of autonomously searching for and formally verifying mathematical conjectures. The approach presents the first large-scale evaluation of AI-driven formal proof on open problems, uncovering key mechanisms essential for designing effective reasoning agents. The system autonomously resolved 9 out of 353 Erdős problems and formally proved 44 of 492 conjectures from the OEIS, demonstrating practical utility across multiple domains of mathematical research.
📝 Abstract
Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erdős problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erdős successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.