OProver: A Unified Framework for Agentic Formal Theorem Proving

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the limitation of existing theorem-proving approaches, which incorporate agent-based mechanisms only during inference while lacking closed-loop feedback during training. The authors propose OProver, a novel framework that integrates agent-driven proof search end-to-end into the training loop for the first time. By leveraging Lean 4 compiler feedback and retrieval-augmented mechanisms to iteratively refine failed proofs, and combining continual pretraining with post-training optimization, OProver generates OProofs—a dataset comprising proof trajectories, error feedback, and corrective repairs. The method achieves state-of-the-art Pass@32 performance on MiniF2F (93.3%), ProverBench (58.2%), and PutnamBench (11.3%), and ranks second on MathOlympiad and ProofNet, substantially outperforming current open-source full-proof models.

📝 Abstract

Recent progress in formal theorem proving has benefited from large-scale proof generation and verifier-aware training, but agentic proving is rarely integrated into prover training, appearing only at inference time. We present OProver, a unified framework for agentic formal theorem proving in Lean 4, in which failed proof attempts are iteratively revised using retrieved compiler verified proofs and Lean compiler feedback. OProver is trained through continued pretraining followed by iterative post-training: each iteration runs agentic proving, indexes newly verified proofs into OProofs and the retrieval memory, uses repair trajectories as SFT data, and uses unresolved hard cases for RL. OProofs is built from public Lean resources, large-scale proof synthesis, and agentic proving traces, containing 1.77M Lean statements, 6.86M compiler-verified proofs, and serialized trajectories with retrieved context, failed attempts, feedback, and repairs. Across five benchmarks, OProver-32B attains the best Pass@32 on MiniF2F (93.3%), ProverBench (58.2%), and PutnamBench (11.3%), and ranks second on MathOlympiad (22.8%) and ProofNet (33.2%) more top placements than any prior open-weight whole-proof prover.

Problem

Research questions and friction points this paper is trying to address.

agentic proving

formal theorem proving

proof generation

verifier feedback

prover training

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic theorem proving

iterative post-training

retrieval-augmented proof repair