Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Dominant paradigms in automated theorem proving rely on large-scale reinforcement learning training, entailing high computational cost and opaque decision-making. Method: This paper introduces DSP+, a neural-symbolic collaborative framework requiring no model training or fine-tuning. It implements fine-grained task division: natural-language subgoal generation, hypothesis-augmented automatic formalization, rule-driven syntactic error correction, and tightly coupled search between the Aesop prover and a tactic-based stepwise prover—enabling efficient draft-sketch-prove coordination. Contribution/Results: DSP+ solves IMO 2019 P1—the first such result on miniF2F—and identifies eight formalization errors. With zero training, it achieves 80.7% (miniF2F), 32.8% (ProofNet), and 24/644 (PutnamBench) solution rates—surpassing state-of-the-art methods under lower computational budget—and generates human-interpretable proof traces.

Technology Category

Application Category

📝 Abstract

Recent advancements, such as DeepSeek-Prover-V2-671B and Kimina-Prover-Preview-72B, demonstrate a prevailing trend in leveraging reinforcement learning (RL)-based large-scale training for automated theorem proving. Surprisingly, we discover that even without any training, careful neuro-symbolic coordination of existing off-the-shelf reasoning models and tactic step provers can achieve comparable performance. This paper introduces extbf{DSP+}, an improved version of the Draft, Sketch, and Prove framework, featuring a emph{fine-grained and integrated} neuro-symbolic enhancement for each phase: (1) In the draft phase, we prompt reasoning models to generate concise natural-language subgoals to benefit the sketch phase, removing thinking tokens and references to human-written proofs; (2) In the sketch phase, subgoals are autoformalized with hypotheses to benefit the proving phase, and sketch lines containing syntactic errors are masked according to predefined rules; (3) In the proving phase, we tightly integrate symbolic search methods like Aesop with step provers to establish proofs for the sketch subgoals. Experimental results show that, without any additional model training or fine-tuning, DSP+ solves 80.7%, 32.8%, and 24 out of 644 problems from miniF2F, ProofNet, and PutnamBench, respectively, while requiring fewer budgets compared to state-of-the-arts. DSP+ proves exttt{imo_2019_p1}, an IMO problem in miniF2F that is not solved by any prior work. Additionally, DSP+ generates proof patterns comprehensible by human experts, facilitating the identification of formalization errors; For example, eight wrongly formalized statements in miniF2F are discovered. Our results highlight the potential of classical reasoning patterns besides the RL-based training. All components will be open-sourced.

Problem

Research questions and friction points this paper is trying to address.

Enhancing theorem proving without additional model training

Improving neuro-symbolic coordination in draft, sketch, prove phases

Achieving high performance on miniF2F, ProofNet, PutnamBench problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained neuro-symbolic enhancement for theorem proving

Autoformalization of subgoals with hypotheses

Integration of symbolic search with step provers

🔎 Similar Papers

No similar papers found.