Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the low correctness rate of formal theorem proving generated by large language models (LLMs) in Lean 4. We propose a verifier-driven reinforcement learning (RL) framework. Methodologically, we introduce fine-grained feedback from the Lean 4 verifier—such as type errors and unresolved identifiers—directly into RL training for the first time, enabling model self-awareness and iterative, interactive error correction via a feedback token masking mechanism; we further integrate chain-of-thought reasoning with a concise reward design. Evaluated on MiniF2F with a 7B-parameter model, our approach achieves 38.6% pass@128, outperforming Kimina-Prover-Preview-Distill-7B and DeepSeek-Prover-V2-7B by 3.2% and 2.0%, respectively. This demonstrates significantly enhanced intrinsic discrimination and correction capability of LLMs for formal reasoning correctness.

Technology Category

Application Category

📝 Abstract

We introduce our Leanabell-Prover-V2, a 7B large language models (LLMs) that can produce formal theorem proofs in Lean 4, with verifier-integrated Long Chain-of-Thoughts (CoT). Following our previous work Leanabell-Prover-V1, we continual to choose to posttrain existing strong prover models for further performance improvement. In our V2 version, we mainly upgrade the Reinforcement Learning (RL) with feedback provided by the Lean 4 verifier. Crucially, verifier feedback, such as indicating success or detailing specific errors, allows the LLM to become ``self-aware'' of the correctness of its own reasoning process and learn to reflexively correct errors. Leanabell-Prover-V2 directly optimizes LLM reasoning trajectories with multi-turn verifier interactions, together with feedback token masking for stable RL training and a simple reward strategy. Experiments show that Leanabell-Prover-V2 improves performance by 3.2% (pass@128) with Kimina-Prover-Preview-Distill-7B and 2.0% (pass@128) with DeepSeek-Prover-V2-7B on the MiniF2F test set. The source codes, curated data and models are available at: https://github.com/Leanabell-LM/Leanabell-Prover-V2.

Problem

Research questions and friction points this paper is trying to address.

Enhancing formal theorem proving using verifier-integrated reinforcement learning

Improving LLM reasoning via Lean 4 verifier feedback and error correction

Optimizing proof generation in Lean 4 with multi-turn verifier interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifier-integrated reasoning with Lean 4

Reinforcement Learning with verifier feedback

Feedback token masking for stable RL

🔎 Similar Papers

No similar papers found.

Authors to Follow