ProofCompass: Enhancing Specialized Provers with LLM Guidance

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the low efficiency and rigid proof strategies of domain-specific theorem provers (e.g., DeepSeek-Prover-v1.5-RL) in formal mathematical reasoning. We propose ProofCompass, a training-free collaborative framework that leverages large language models (LLMs) to generate natural-language-level proof strategies and perform failure-path diagnostics, thereby guiding the specialized prover in problem decomposition, intermediate lemma selection, and directed search. Our key contribution is the first zero-shot integration of an LLM as a decoupled strategy controller and diagnostic agent—operating without fine-tuning or architectural modification to either the LLM or the prover—thus preserving both formal correctness and computational efficiency. On the miniF2F benchmark, ProofCompass achieves a 55.3% proof success rate using only 128 attempts—reducing the attempt count by 25× over baseline methods—demonstrating substantial gains in both effectiveness and scalability.

Technology Category

Application Category

📝 Abstract

Language models have become increasingly powerful tools for formal mathematical reasoning. However, most existing approaches rely exclusively on either large general-purpose models or smaller specialized models, each with distinct limitations, while training specialized large models still requires significant computational resources. This paper introduces ProofCompass, a novel hybrid methodology that achieves remarkable computational efficiency by strategically guiding existing specialized prover methods, such as DeepSeek-Prover-v1.5-RL (DSP-v1.5) with a Large Language Model (LLM) without requiring additional model training. The LLM provides natural language proof strategies and analyzes failed attempts to select intermediate lemmas, enabling effective problem decomposition. On the miniF2F benchmark, ProofCompass demonstrates substantial resource efficiency: it outperforms DSP-v1.5 ($54.9% ightarrow 55.3%$) while using 25x fewer attempts ($3200 ightarrow 128$). Our synergistic approach paves the way for simultaneously improving computational efficiency and accuracy in formal theorem proving.

Problem

Research questions and friction points this paper is trying to address.

Enhancing specialized provers with LLM guidance efficiently

Overcoming limitations of general and specialized models alone

Improving theorem proving accuracy with fewer computational attempts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid methodology combining LLM and specialized provers

LLM guides proof strategies and lemma selection

Achieves higher accuracy with fewer attempts

🔎 Similar Papers

No similar papers found.