🤖 AI Summary
This work addresses the low efficiency and rigid proof strategies of domain-specific theorem provers (e.g., DeepSeek-Prover-v1.5-RL) in formal mathematical reasoning. We propose ProofCompass, a training-free collaborative framework that leverages large language models (LLMs) to generate natural-language-level proof strategies and perform failure-path diagnostics, thereby guiding the specialized prover in problem decomposition, intermediate lemma selection, and directed search. Our key contribution is the first zero-shot integration of an LLM as a decoupled strategy controller and diagnostic agent—operating without fine-tuning or architectural modification to either the LLM or the prover—thus preserving both formal correctness and computational efficiency. On the miniF2F benchmark, ProofCompass achieves a 55.3% proof success rate using only 128 attempts—reducing the attempt count by 25× over baseline methods—demonstrating substantial gains in both effectiveness and scalability.
📝 Abstract
Language models have become increasingly powerful tools for formal mathematical reasoning. However, most existing approaches rely exclusively on either large general-purpose models or smaller specialized models, each with distinct limitations, while training specialized large models still requires significant computational resources. This paper introduces ProofCompass, a novel hybrid methodology that achieves remarkable computational efficiency by strategically guiding existing specialized prover methods, such as DeepSeek-Prover-v1.5-RL (DSP-v1.5) with a Large Language Model (LLM) without requiring additional model training. The LLM provides natural language proof strategies and analyzes failed attempts to select intermediate lemmas, enabling effective problem decomposition. On the miniF2F benchmark, ProofCompass demonstrates substantial resource efficiency: it outperforms DSP-v1.5 ($54.9%
ightarrow 55.3%$) while using 25x fewer attempts ($3200
ightarrow 128$). Our synergistic approach paves the way for simultaneously improving computational efficiency and accuracy in formal theorem proving.