Mathesis: Towards Formal Theorem Proving from Natural Languages

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing end-to-end automated theorem proving (ATP) systems rely heavily on expert-written formal premises, limiting their applicability to natural-language mathematical problems. Method: We propose Mathesis, the first end-to-end system for high-stakes, real-world examination problems (Gaokao-Formal), comprising: (i) Mathesis-Autoformalizer—a reinforcement learning–based automatic formalization module; (ii) LeanScorer—a fine-grained formalization quality evaluation framework; and (iii) Mathesis-Prover—a customized proof generator. Contribution/Results: On Gaokao-Formal, Mathesis improves automatic formalization pass rate by 22%. In end-to-end proof generation, it achieves 64% pass@32 on MiniF2F and sets a new state-of-the-art 18% pass@32 on Gaokao-Formal—marking a significant breakthrough in automating the natural-language-to-formal-proof pipeline.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem statements. It contributes Mathesis-Autoformalizer, the first autoformalizer using reinforcement learning to enhance the formalization ability of natural language problems, aided by our novel LeanScorer framework for nuanced formalization quality assessment. It also proposes a Mathesis-Prover, which generates formal proofs from the formalized statements. To evaluate the real-world applicability of end-to-end formal theorem proving, we introduce Gaokao-Formal, a benchmark of 488 complex problems from China's national college entrance exam. Our approach is carefully designed, with a thorough study of each component. Experiments demonstrate Mathesis's effectiveness, with the autoformalizer outperforming the best baseline by 22% in pass-rate on Gaokao-Formal. The full system surpasses other model combinations, achieving 64% accuracy on MiniF2F with pass@32 and a state-of-the-art 18% on Gaokao-Formal.
Problem

Research questions and friction points this paper is trying to address.

Bridging natural language to formal theorem proving
Enhancing autoformalization with reinforcement learning
Evaluating real-world applicability via Gaokao-Formal benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoformalizer enhances formalization via reinforcement learning
LeanScorer assesses formalization quality with novel framework
Prover generates formal proofs from formalized statements
🔎 Similar Papers
No similar papers found.
X
Xuejun Yu
Huawei Celia Team
Jianyuan Zhong
Jianyuan Zhong
The Chinese University of Hong Kong
Machine Learning
Zijin Feng
Zijin Feng
The Chinese University of Hong Kong
Large Language ModelsData Mining
P
Pengyi Zhai
Huawei Celia Team
R
Roozbeh Yousefzadeh
Huawei Noah’s Ark Lab
W
Wei Chong Ng
Huawei Celia Team
H
Haoxiong Liu
Huawei Noah’s Ark Lab
Ziyi Shou
Ziyi Shou
Huawei Celia Team
J
Jing Xiong
Huawei Noah’s Ark Lab
Y
Yudong Zhou
Huawei Celia Team
C
Claudia Beth Ong
Huawei Celia Team
A
Austen Jeremy Sugiarto
Huawei Celia Team
Y
Yaoxi Zhang
Huawei Celia Team
W
Wai Ming Tai
Huawei Celia Team
H
Huan Cao
Huawei Celia Team
Dongcai Lu
Dongcai Lu
Huawei Tech.
Large Language ModelMath Reasoningrobotics
J
Jiacheng Sun
Huawei Noah’s Ark Lab
Q
Qiang Xu
The Chinese University of Hong Kong
S
Shen Xin
Huawei Celia Team
Zhenguo Li
Zhenguo Li
Huawei Noah's Ark Lab, Columbia, CUHK, PKU
machine learninggenerative AIAI for mathematics