FMC: Formalization of Natural Language Mathematical Competition Problems

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automating the formalization of natural-language mathematical competition problems into machine-checkable proof languages remains a critical challenge in formal mathematics and automated theorem proving. Method: We propose a training-free, large language model (LLM)-based approach that integrates few-shot prompting, multi-round sampling, Lean syntax guidance, and iterative error feedback to achieve end-to-end formalization alignment. Contribution/Results: This is the first method to achieve fully automatic, high-quality formalization on Olympiad-level problems. We introduce MathFormal-3922—the largest bilingual mathematical reasoning benchmark to date—comprising 3,922 natural-language problems and 9,787 corresponding Lean formalizations. Human evaluation shows 64.46% of formalizations attain medium or higher quality, significantly outperforming existing baselines. Our work establishes a scalable methodological framework for automated formalization and provides a rigorous, high-difficulty evaluation benchmark for formal mathematical reasoning.

Technology Category

Application Category

📝 Abstract
Efficient and accurate autoformalization methods, which leverage large-scale datasets of extensive natural language mathematical problems to construct formal language datasets, are key to advancing formal mathematical reasoning. In this paper, we propose an autoformalization pipeline based on large language models with error feedback, achieving a fully automatic and training-free formalization approach. Using this pipeline, we curate an Olympiad-level dataset aligning natural language problems with Lean formalizations. The dataset comprises $3,922$ mathematical problems in natural language and $9,787$ in Lean, of which $64.46%$ were assessed as at least above-average quality, making it suitable as a benchmark for automated theorem provers. Additionally, we investigate the formalization and reasoning capabilities of various LLMs and empirically demonstrate that few-shot learning, error feedback, and increasing sampling numbers enhance the autoformalization process. Experiments of three automated theorem provers on the dataset dataset also highlight its challenging nature and its value as a benchmark for formal reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

Autoformalization of natural language math problems into Lean
Creating a benchmark dataset for automated theorem provers
Enhancing LLMs' formalization with feedback and few-shot learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoformalization pipeline using large language models
Error feedback enhances formalization accuracy
Few-shot learning improves autoformalization process
🔎 Similar Papers
No similar papers found.
J
Jiaxuan Xie
State Key Laboratory for Multimedia Information Processing, School of Computer Science, PKU-Anker LLM Lab, Peking University, Beijing, China
C
Chengwu Liu
State Key Laboratory for Multimedia Information Processing, School of Computer Science, PKU-Anker LLM Lab, Peking University, Beijing, China
Y
Ye Yuan
State Key Laboratory for Multimedia Information Processing, School of Computer Science, PKU-Anker LLM Lab, Peking University, Beijing, China
S
Siqi Li
State Key Laboratory for Multimedia Information Processing, School of Computer Science, PKU-Anker LLM Lab, Peking University, Beijing, China
Zhiping Xiao
Zhiping Xiao
Postdoc at University of Washington
CSEDMML
M
Ming Zhang
State Key Laboratory for Multimedia Information Processing, School of Computer Science, PKU-Anker LLM Lab, Peking University, Beijing, China