🤖 AI Summary
This work addresses the challenge of weak generalization and inefficient search in LLM-based automated theorem proving, stemming from scarce high-quality training data. We propose a proof-state-space sampling method for synthetic data generation, systematically covering diverse intermediate proof states and tactic combinations; and introduce an adaptive beam-width control mechanism to dynamically balance exploration and exploitation during tree search. The generated data enables one-shot fine-tuning of policy models, eliminating the need for iterative refinement or reinforcement learning. Our approach achieves 60.74% Pass@1 on MiniF2F and 21.18% on ProofNet—substantially outperforming state-of-the-art baselines. Key contributions are: (1) a scalable, semantically rich paradigm for constructing synthetic theorem-proving data; and (2) a data-driven framework that jointly optimizes search and training.
📝 Abstract
Recent advancements in large language models (LLMs) have sparked considerable interest in automated theorem proving and a prominent line of research integrates stepwise LLM-based provers into tree search. In this paper, we introduce a novel proof-state exploration approach for training data synthesis, designed to produce diverse tactics across a wide range of intermediate proof states, thereby facilitating effective one-shot fine-tuning of LLM as the policy model. We also propose an adaptive beam size strategy, which effectively takes advantage of our data synthesis method and achieves a trade-off between exploration and exploitation during tree search. Evaluations on the MiniF2F and ProofNet benchmarks demonstrate that our method outperforms strong baselines under the stringent Pass@1 metric, attaining an average pass rate of $60.74%$ on MiniF2F and $21.18%$ on ProofNet. These results underscore the impact of large-scale synthetic data in advancing automated theorem proving.