🤖 AI Summary
Formal mathematical proof generation remains challenging due to high cognitive and technical barriers.
Method: This paper proposes a lightweight, end-to-end approach within the Lean 3 framework that leverages only ChatGPT (via API) and prompt-driven proof sketch generation, augmented by depth-first backtracking search and an automated verification feedback loop—requiring no model fine-tuning or complex reasoning architectures.
Contribution/Results: It introduces the first practical integration of commercial large language models with a verifiable formal language (Lean) for cooperative proof synthesis. Evaluated on the miniF2F benchmark, the method achieves a 31.15% formal verification pass rate—surpassing all non-fine-tuned baselines. Cross-dataset and multi-model experiments confirm its generalizability and robustness. By eliminating the need for specialized training or infrastructure, this approach substantially improves both the efficiency and accessibility of formal proof generation.
📝 Abstract
The challenge of formal proof generation has a rich history, but with modern techniques, we may finally be at the stage of making actual progress in real-life mathematical problems. This paper explores the integration of ChatGPT and basic searching techniques to simplify generating formal proofs, with a particular focus on the miniF2F dataset. We demonstrate how combining a large language model like ChatGPT with a formal language such as Lean, which has the added advantage of being verifiable, enhances the efficiency and accessibility of formal proof generation. Despite its simplicity, our best-performing Lean-based model surpasses all known benchmarks with a 31.15% pass rate. We extend our experiments to include other datasets and employ alternative language models, showcasing our models' comparable performance in diverse settings and allowing for a more nuanced analysis of our results. Our findings offer insights into AI-assisted formal proof generation, suggesting a promising direction for future research in formal mathematical proof.