Mathesis: Towards Formal Theorem Proving from Natural Languages

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing end-to-end automated theorem proving (ATP) systems rely heavily on expert-written formal premises, limiting their applicability to natural-language mathematical problems. Method: We propose Mathesis, the first end-to-end system for high-stakes, real-world examination problems (Gaokao-Formal), comprising: (i) Mathesis-Autoformalizer—a reinforcement learning–based automatic formalization module; (ii) LeanScorer—a fine-grained formalization quality evaluation framework; and (iii) Mathesis-Prover—a customized proof generator. Contribution/Results: On Gaokao-Formal, Mathesis improves automatic formalization pass rate by 22%. In end-to-end proof generation, it achieves 64% pass@32 on MiniF2F and sets a new state-of-the-art 18% pass@32 on Gaokao-Formal—marking a significant breakthrough in automating the natural-language-to-formal-proof pipeline.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem statements. It contributes Mathesis-Autoformalizer, the first autoformalizer using reinforcement learning to enhance the formalization ability of natural language problems, aided by our novel LeanScorer framework for nuanced formalization quality assessment. It also proposes a Mathesis-Prover, which generates formal proofs from the formalized statements. To evaluate the real-world applicability of end-to-end formal theorem proving, we introduce Gaokao-Formal, a benchmark of 488 complex problems from China's national college entrance exam. Our approach is carefully designed, with a thorough study of each component. Experiments demonstrate Mathesis's effectiveness, with the autoformalizer outperforming the best baseline by 22% in pass-rate on Gaokao-Formal. The full system surpasses other model combinations, achieving 64% accuracy on MiniF2F with pass@32 and a state-of-the-art 18% on Gaokao-Formal.

Problem

Research questions and friction points this paper is trying to address.

Bridging natural language to formal theorem proving

Enhancing autoformalization with reinforcement learning

Evaluating real-world applicability via Gaokao-Formal benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoformalizer enhances formalization via reinforcement learning

LeanScorer assesses formalization quality with novel framework

Prover generates formal proofs from formalized statements

🔎 Similar Papers

No similar papers found.