DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Contemporary large language models (LLMs) frequently produce mathematically correct final answers while harboring flawed reasoning—especially in rigorous, stepwise tasks like theorem proving—where supervision based solely on final-answer accuracy is fundamentally insufficient. Method: We propose a generate-verify co-evolution paradigm: (1) a high-precision LLM-based verifier that scores each reasoning step; (2) using this verifier as a reward model to guide a proof generator in autonomously identifying and correcting logical flaws; and (3) employing reinforcement learning to drive reasoning expansion, test-time computation scaling, and dynamic closed-loop data generation for self-consistent verification of open-ended problems lacking canonical solutions. Contribution/Results: Our framework achieves gold-medal performance on IMO 2025 and CMO 2024, and scores 118/120 on Putnam 2024—marking the first mathematically grounded, self-verifying, and iteratively refinable reasoning system.

Technology Category

Application Category

📝 Abstract

Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations. Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier. Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute.

Problem

Research questions and friction points this paper is trying to address.

Develop self-verifiable reasoning for rigorous mathematical proofs

Train LLM verifiers to check step-by-step reasoning accuracy

Scale verification to handle complex problems without known solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-verifiable reasoning via LLM-based verifier training

Generator trained with verifier as reward model

Scaled verification compute to label hard proofs

🔎 Similar Papers

Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models