Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the low correctness rate of formal theorem proving generated by large language models (LLMs) in Lean 4. We propose a verifier-driven reinforcement learning (RL) framework. Methodologically, we introduce fine-grained feedback from the Lean 4 verifier—such as type errors and unresolved identifiers—directly into RL training for the first time, enabling model self-awareness and iterative, interactive error correction via a feedback token masking mechanism; we further integrate chain-of-thought reasoning with a concise reward design. Evaluated on MiniF2F with a 7B-parameter model, our approach achieves 38.6% pass@128, outperforming Kimina-Prover-Preview-Distill-7B and DeepSeek-Prover-V2-7B by 3.2% and 2.0%, respectively. This demonstrates significantly enhanced intrinsic discrimination and correction capability of LLMs for formal reasoning correctness.

Technology Category

Application Category

📝 Abstract
We introduce our Leanabell-Prover-V2, a 7B large language models (LLMs) that can produce formal theorem proofs in Lean 4, with verifier-integrated Long Chain-of-Thoughts (CoT). Following our previous work Leanabell-Prover-V1, we continual to choose to posttrain existing strong prover models for further performance improvement. In our V2 version, we mainly upgrade the Reinforcement Learning (RL) with feedback provided by the Lean 4 verifier. Crucially, verifier feedback, such as indicating success or detailing specific errors, allows the LLM to become ``self-aware'' of the correctness of its own reasoning process and learn to reflexively correct errors. Leanabell-Prover-V2 directly optimizes LLM reasoning trajectories with multi-turn verifier interactions, together with feedback token masking for stable RL training and a simple reward strategy. Experiments show that Leanabell-Prover-V2 improves performance by 3.2% (pass@128) with Kimina-Prover-Preview-Distill-7B and 2.0% (pass@128) with DeepSeek-Prover-V2-7B on the MiniF2F test set. The source codes, curated data and models are available at: https://github.com/Leanabell-LM/Leanabell-Prover-V2.
Problem

Research questions and friction points this paper is trying to address.

Enhancing formal theorem proving using verifier-integrated reinforcement learning
Improving LLM reasoning via Lean 4 verifier feedback and error correction
Optimizing proof generation in Lean 4 with multi-turn verifier interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifier-integrated reasoning with Lean 4
Reinforcement Learning with verifier feedback
Feedback token masking for stable RL
🔎 Similar Papers
No similar papers found.
X
Xingguang Ji
Klear Team, Kuaishou Technology
Y
Yahui Liu
Klear Team, Kuaishou Technology
Q
Qi Wang
Klear Team, Kuaishou Technology
J
Jingyuan Zhang
Klear Team, Kuaishou Technology
Y
Yang Yue
Klear Team, Kuaishou Technology
Rui Shi
Rui Shi
ByteDance, Inc.
Database SystemsBig DataDistributed SystemsCloud NativeProgramming Languages
C
Chenxi Sun
Klear Team, Kuaishou Technology
F
Fuzheng Zhang
Klear Team, Kuaishou Technology
Guorui Zhou
Guorui Zhou
Unknown affiliation
Recommender System,Advertising,Artificial Intelligence,Machine Learning,NLP
Kun Gai
Kun Gai
Senior Director & Researcher, Alibaba Group
Machine LearningComputational Advertising