Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited geometric reasoning capabilities under two key constraints: scarcity of explicit supervision (e.g., reliance solely on natural language descriptions) and weak proficiency in formal proof languages (e.g., Lean). Method: This paper introduces a lemma-style full-proof reasoning framework featuring a closed-loop optimization mechanism that integrates Lean-based formal verification feedback, reuse of proven lemmas, and self-summarization. It further designs a test-time inference strategy jointly optimizing depth (long-chain reasoning + reinforcement learning) and breadth (multi-path exploration), and develops Seed-Geometry—a domain-specific formal reasoning engine for geometry. Results: The approach achieves 78.1% formal proof success on the IMO Formalization Benchmark, fully saturates MiniF2F, attains >50% on PutnamBench, and successfully generated fully automated formal proofs for 5 out of 6 problems in the IMO 2025 competition.

Technology Category

Application Category

📝 Abstract

LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose extbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves $78.1%$ of formalized past IMO problems, saturates MiniF2F, and achieves over 50% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine extbf{Seed-Geometry}, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.

Problem

Research questions and friction points this paper is trying to address.

Improving theorem proving via formal verification and reinforcement learning

Enhancing geometry reasoning with a dedicated formal engine

Achieving high success rates on IMO and Putnam problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lemma-style whole-proof reasoning model

Reinforcement learning with Lean feedback

Geometry reasoning engine for Lean

🔎 Similar Papers

No similar papers found.