StepFun-Prover Preview: Let's Think and Verify Step by Step

📅 2025-07-27

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the challenge of unreliable Lean 4 proof generation by large language models (LLMs) in formal theorem proving. We propose a tool-augmented, end-to-end reasoning framework that tightly integrates LLMs with the Lean 4 proof environment, enabling interactive tool invocation, real-time feedback-driven reinforcement learning, and human-like stepwise verification. Our key contribution is a closed-loop “generate–execute–feedback–revise” mechanism, which significantly improves proof-generation reliability under few-shot conditions. Evaluated on the miniF2F-test benchmark, our approach achieves a 70.0% pass@1 success rate—constituting a substantial improvement over prior methods. This framework establishes a new paradigm for automated theorem proving and trustworthy mathematical AI assistants.

Technology Category

Application Category

📝 Abstract

We present StepFun-Prover Preview, a large language model designed for formal theorem proving through tool-integrated reasoning. Using a reinforcement learning pipeline that incorporates tool-based interactions, StepFun-Prover can achieve strong performance in generating Lean 4 proofs with minimal sampling. Our approach enables the model to emulate human-like problem-solving strategies by iteratively refining proofs based on real-time environment feedback. On the miniF2F-test benchmark, StepFun-Prover achieves a pass@1 success rate of $70.0%$. Beyond advancing benchmark performance, we introduce an end-to-end training framework for developing tool-integrated reasoning models, offering a promising direction for automated theorem proving and Math AI assistant.

Problem

Research questions and friction points this paper is trying to address.

Develops a model for formal theorem proving using tools

Enhances proof generation with reinforcement learning

Improves automated theorem proving benchmark performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for tool-integrated reasoning

Iterative proof refinement via environment feedback

End-to-end training for theorem proving models

🔎 Similar Papers

Lean-STaR: Learning to Interleave Thinking and Proving