BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low tree-search efficiency and high computational overhead in large-scale theorem proving with Lean 4. To this end, it proposes a lightweight Best-First Search (BFS) as a viable alternative to complex tree-search methods (e.g., MCTS). Methodologically: (1) it introduces compilation-error–driven Direct Preference Optimization (DPO) for fine-tuning; (2) designs a difficulty-aware dynamic data filtering mechanism; and (3) adopts length normalization to encourage exploration of deeper proof paths. Evaluated on the MiniF2F benchmark, the approach achieves 71.31 points—comparable to state-of-the-art complex methods—while significantly improving search cost-effectiveness and scalability. This is the first systematic demonstration of BFS’s high efficiency in formal reasoning. Moreover, the work establishes an expert iterative training framework specifically tailored for theorem proving.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating proof search spaces. While the existing approaches primarily rely on value functions and Monte Carlo Tree Search (MCTS), the potential of simpler methods like Best-First Search (BFS) remains underexplored. This paper investigates whether BFS can achieve competitive performance in large-scale theorem proving tasks. We present exttt{BFS-Prover}, a scalable expert iteration framework, featuring three key innovations. First, we implement strategic data filtering at each expert iteration round, excluding problems solvable via beam search node expansion to focus on harder cases. Second, we improve the sample efficiency of BFS through Direct Preference Optimization (DPO) applied to state-tactic pairs automatically annotated with compiler error feedback, refining the LLM's policy to prioritize productive expansions. Third, we employ length normalization in BFS to encourage exploration of deeper proof paths. exttt{BFS-Prover} achieves a score of $71.31$ on the MiniF2F test set and therefore challenges the perceived necessity of complex tree search methods, demonstrating that BFS can achieve competitive performance when properly scaled.
Problem

Research questions and friction points this paper is trying to address.

Scalable Best-First Search for theorem proving
Improving sample efficiency with Direct Preference Optimization
Encouraging deeper proof exploration via length normalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Strategic data filtering
Direct Preference Optimization
Length normalization in BFS
🔎 Similar Papers
No similar papers found.