Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the sparse reward problem and high computational overhead in reinforcement learning for automated theorem proving, this paper introduces a novel “verifier-embedded reasoning loop” paradigm: deeply integrating the Lean theorem verifier into the large language model’s reasoning process to enable immediate, stepwise correctness validation. Unlike conventional approaches relying on full-trajectory rollouts and terminal feedback, our method pioneers step-level reward modeling and lightweight online verification-driven policy optimization—eliminating the need for trajectory annotations or costly rollouts. Experiments demonstrate substantial improvements in proof success rate and reasoning accuracy, alongside reduced average reasoning steps and lower training computational cost. Crucially, local verification signals effectively guide global performance enhancement, establishing a scalable and efficient framework for neural-symbolic theorem proving.

Technology Category

Application Category

📝 Abstract
The most promising recent methods for AI reasoning require applying variants of reinforcement learning (RL) either on rolled out trajectories from the model, even for the step-wise rewards, or large quantities of human annotated trajectory data. The reliance on the rolled-out trajectory renders the compute cost and time prohibitively high. In particular, the correctness of a reasoning trajectory can typically only be judged at its completion, leading to sparse rewards in RL or requiring expensive synthetic data generation in expert iteration-like methods. In this work, we focus on the Automatic Theorem Proving (ATP) task and propose a novel verifier-in-the-loop design, which unlike existing approaches that leverage feedback on the entire reasoning trajectory, employs an automated verifier to give intermediate feedback at each step of the reasoning process. Using Lean as the verifier, we empirically show that the step-by-step local verification produces a global improvement in the model's reasoning accuracy and efficiency.
Problem

Research questions and friction points this paper is trying to address.

High compute cost and time in AI reasoning methods
Sparse rewards in reinforcement learning for theorem proving
Need for intermediate feedback in automated theorem proving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifier-in-the-loop design for ATP
Step-by-step local verification feedback
Lean verifier improves reasoning accuracy
🔎 Similar Papers
No similar papers found.