🤖 AI Summary
This work addresses the challenge of unreliable Lean 4 proof generation by large language models (LLMs) in formal theorem proving. We propose a tool-augmented, end-to-end reasoning framework that tightly integrates LLMs with the Lean 4 proof environment, enabling interactive tool invocation, real-time feedback-driven reinforcement learning, and human-like stepwise verification. Our key contribution is a closed-loop “generate–execute–feedback–revise” mechanism, which significantly improves proof-generation reliability under few-shot conditions. Evaluated on the miniF2F-test benchmark, our approach achieves a 70.0% pass@1 success rate—constituting a substantial improvement over prior methods. This framework establishes a new paradigm for automated theorem proving and trustworthy mathematical AI assistants.
📝 Abstract
We present StepFun-Prover Preview, a large language model designed for formal theorem proving through tool-integrated reasoning. Using a reinforcement learning pipeline that incorporates tool-based interactions, StepFun-Prover can achieve strong performance in generating Lean 4 proofs with minimal sampling. Our approach enables the model to emulate human-like problem-solving strategies by iteratively refining proofs based on real-time environment feedback. On the miniF2F-test benchmark, StepFun-Prover achieves a pass@1 success rate of $70.0%$. Beyond advancing benchmark performance, we introduce an end-to-end training framework for developing tool-integrated reasoning models, offering a promising direction for automated theorem proving and Math AI assistant.