🤖 AI Summary
This study addresses the low correctness rate of formal theorem proving generated by large language models (LLMs) in Lean 4. We propose a verifier-driven reinforcement learning (RL) framework. Methodologically, we introduce fine-grained feedback from the Lean 4 verifier—such as type errors and unresolved identifiers—directly into RL training for the first time, enabling model self-awareness and iterative, interactive error correction via a feedback token masking mechanism; we further integrate chain-of-thought reasoning with a concise reward design. Evaluated on MiniF2F with a 7B-parameter model, our approach achieves 38.6% pass@128, outperforming Kimina-Prover-Preview-Distill-7B and DeepSeek-Prover-V2-7B by 3.2% and 2.0%, respectively. This demonstrates significantly enhanced intrinsic discrimination and correction capability of LLMs for formal reasoning correctness.
📝 Abstract
We introduce our Leanabell-Prover-V2, a 7B large language models (LLMs) that can produce formal theorem proofs in Lean 4, with verifier-integrated Long Chain-of-Thoughts (CoT). Following our previous work Leanabell-Prover-V1, we continual to choose to posttrain existing strong prover models for further performance improvement. In our V2 version, we mainly upgrade the Reinforcement Learning (RL) with feedback provided by the Lean 4 verifier. Crucially, verifier feedback, such as indicating success or detailing specific errors, allows the LLM to become ``self-aware'' of the correctness of its own reasoning process and learn to reflexively correct errors. Leanabell-Prover-V2 directly optimizes LLM reasoning trajectories with multi-turn verifier interactions, together with feedback token masking for stable RL training and a simple reward strategy. Experiments show that Leanabell-Prover-V2 improves performance by 3.2% (pass@128) with Kimina-Prover-Preview-Distill-7B and 2.0% (pass@128) with DeepSeek-Prover-V2-7B on the MiniF2F test set. The source codes, curated data and models are available at: https://github.com/Leanabell-LM/Leanabell-Prover-V2.