DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Low automation in formal mathematical proof generation in Lean 4 hinders scalable formalization. Method: We propose a subgoal-driven cold-start reinforcement learning framework: (1) a novel recursive subgoal decomposition mechanism based on DeepSeek-V3, jointly modeling informal reasoning chains and formal proof synthesis; (2) a multi-stage cold-start RL training paradigm using PPO to enable end-to-end mapping from natural language to Lean 4 code; (3) ProverBench—the first formal benchmark suite targeting middle-school to advanced mathematics competitions, including authentic AIME problems. Results: DeepSeek-Prover-V2-671B achieves 88.9% pass rate on MiniF2F-test, solves 49/658 problems on PutnamBench, and proves 6/15 AIME problems—marking substantial progress in formalizing complex theorems.

Technology Category

Application Category

📝 Abstract

We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing.

Problem

Research questions and friction points this paper is trying to address.

Enhancing formal theorem proving via subgoal decomposition

Integrating informal and formal mathematical reasoning in LLMs

Achieving state-of-the-art performance in neural theorem proving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for subgoal decomposition

Cold-start training with recursive theorem proving

Integration of informal and formal mathematical reasoning

🔎 Similar Papers

No similar papers found.