Autoformalization in the Era of Large Language Models: A Survey

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the automatic formalization of informal mathematical statements into proof languages (e.g., Lean, Isabelle), aiming to enhance theorem proving automation and improve the verifiability of large language model (LLM) mathematical reasoning. Methodologically, it integrates formal logic, mathematical knowledge representation, LLM fine-tuning and prompt engineering, cross-modal alignment, and formal verification toolchains into an end-to-end automated formalization framework. Its contributions are threefold: first, it establishes automated formalization as a novel paradigm for enhancing the trustworthiness of LLM mathematical reasoning, unifying technical developments from both mathematical logic and LLM perspectives; second, it systematically categorizes open-source models, benchmark datasets, and core challenges, presenting the most comprehensive research landscape to date; third, it identifies critical technical bottlenecks and outlines concrete development pathways, thereby fostering synergistic advancement in automated theorem proving and trustworthy LLM-based mathematical reasoning.

Technology Category

Application Category

📝 Abstract
Autoformalization, the process of transforming informal mathematical propositions into verifiable formal representations, is a foundational task in automated theorem proving, offering a new perspective on the use of mathematics in both theoretical and applied domains. Driven by the rapid progress in artificial intelligence, particularly large language models (LLMs), this field has witnessed substantial growth, bringing both new opportunities and unique challenges. In this survey, we provide a comprehensive overview of recent advances in autoformalization from both mathematical and LLM-centric perspectives. We examine how autoformalization is applied across various mathematical domains and levels of difficulty, and analyze the end-to-end workflow from data preprocessing to model design and evaluation. We further explore the emerging role of autoformalization in enhancing the verifiability of LLM-generated outputs, highlighting its potential to improve both the trustworthiness and reasoning capabilities of LLMs. Finally, we summarize key open-source models and datasets supporting current research, and discuss open challenges and promising future directions for the field.
Problem

Research questions and friction points this paper is trying to address.

Transforming informal math into formal verifiable representations
Applying autoformalization across diverse mathematical domains
Enhancing LLM output verifiability and reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoformalization transforms informal math into formal representations
LLMs enhance verifiability of mathematical propositions
End-to-end workflow from data preprocessing to evaluation
🔎 Similar Papers
No similar papers found.
K
Ke Weng
Northeastern University
L
Lun Du
Ant Research Institute, Ant Group
S
Sirui Li
Northeastern University
W
Wangyue Lu
Northeastern University
H
Haozhe Sun
Northeastern University
H
Hengyu Liu
Department of Computer Science, Aalborg University
Tiancheng Zhang
Tiancheng Zhang
Northeastern University, China
user profiledeep learningmachine learning,intelligent education