Beyond Limited Data: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
To address the performance bottleneck of large language models (LLMs) in formal theorem proving—stemming from scarcity of high-quality training data—this paper proposes the first dual-role self-play theorem-proving framework. In this framework, the model concurrently serves as a *Conjecturer*, dynamically generating increasingly challenging yet provable conjectures, and as a *Prover*, iteratively refining proofs via expert iteration to overcome sparse-reward limitations and enable closed-loop self-improvement. The framework integrates formal verifiers (Lean/Isabelle), reinforcement learning, iterative fine-tuning, and self-supervised conjecture generation. Experiments demonstrate state-of-the-art results: 26.3% proof success rate on LeanWorkbook (double the prior SOTA), 61.1% on miniF2F-test (pass@3200), 23.1% on ProofNet-test, and successful solution of 8 out of 644 problems on PutnamBench (pass@64).

Technology Category

Application Category

📝 Abstract
A fundamental challenge in formal theorem proving by LLMs is the lack of high-quality training data. Although reinforcement learning or expert iteration partially mitigates this issue by alternating between LLM generating proofs and finetuning them on correctly generated ones, performance quickly plateaus due to the scarcity of correct proofs (sparse rewards). To keep improving the models with limited data, we draw inspiration from mathematicians, who continuously develop new results, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. The conjecturer is trained iteratively on previously generated conjectures that are barely provable by the current prover, which incentivizes it to generate increasingly challenging conjectures over time. The prover attempts to prove the conjectures with standard expert iteration. We evaluate STP with both Lean and Isabelle formal versifiers. With 19.8 billion tokens generated during the training in Lean, STP proves 26.3% of the statements in the LeanWorkbook dataset, doubling the previous best result of 13.2% achieved through expert iteration. The final model achieves state-of-the-art performance among whole-proof generation methods on miniF2F-test (61.1%, pass@3200), Proofnet-test (23.1%, pass@3200) and PutnamBench (8/644, pass@64).
Problem

Research questions and friction points this paper is trying to address.

LLM
Theorem Proving
Training Examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-play
Theorem Proving
Mathematical Conjectures
🔎 Similar Papers