Autoformalizer with Tool Feedback

πŸ“… 2025-10-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing automated formalization methods exhibit significant deficiencies in syntactic correctness and semantic consistency. To address this, we propose Autoformalizer with Tool Feedback (ATF), the first framework to jointly leverage Lean 4 compiler syntax feedback and a multi-LLM semantic adjudication mechanism for closed-loop optimization of formalization generation. ATF employs a three-stage training strategy: synthetic-data cold-start initialization, expert-guided iterative fine-tuning, and direct preference optimization (DPO). In both automated and human evaluations, ATF substantially outperforms state-of-the-art baselines and demonstrates strong reasoning generalization. To foster community advancement, we release Numina-ATFβ€”a high-quality synthetic dataset comprising 750,000 formalized statements. This work establishes a scalable, verifiable paradigm for reliable translation from natural language to formal mathematical propositions.

Technology Category

Application Category

πŸ“ Abstract
Autoformalization addresses the scarcity of data for Automated Theorem Proving (ATP) by translating mathematical problems from natural language into formal statements. Efforts in recent work shift from directly prompting large language models to training an end-to-end formalizer model from scratch, achieving remarkable advancements. However, existing formalizer still struggles to consistently generate valid statements that meet syntactic validity and semantic consistency. To address this issue, we propose the Autoformalizer with Tool Feedback (ATF), a novel approach that incorporates syntactic and consistency information as tools into the formalization process. By integrating Lean 4 compilers for syntax corrections and employing a multi-LLMs-as-judge approach for consistency validation, the model is able to adaptively refine generated statements according to the tool feedback, enhancing both syntactic validity and semantic consistency. The training of ATF involves a cold-start phase on synthetic tool-calling data, an expert iteration phase to improve formalization capabilities, and Direct Preference Optimization to alleviate ineffective revisions. Experimental results show that ATF markedly outperforms a range of baseline formalizer models, with its superior performance further validated by human evaluations. Subsequent analysis reveals that ATF demonstrates excellent inference scaling properties. Moreover, we open-source Numina-ATF, a dataset containing 750K synthetic formal statements to facilitate advancements in autoformalization and ATP research.
Problem

Research questions and friction points this paper is trying to address.

Autoformalization struggles with generating syntactically valid formal statements
Existing methods lack semantic consistency in mathematical problem translation
Current approaches cannot consistently produce correct ATP-ready formalizations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Lean 4 compiler for syntax corrections
Uses multi-LLMs-as-judge for consistency validation
Employs Direct Preference Optimization to reduce revisions
πŸ”Ž Similar Papers
No similar papers found.
Q
Qi Guo
National Engineering Research Center for Software Engineering, Peking University, Beijing, China
J
Jianing Wang
Meituan Group, Beijing, China
J
Jianfei Zhang
Meituan Group, Beijing, China
Deyang Kong
Deyang Kong
Peking University
Natural Language Processing
X
Xiangzhou Huang
Meituan Group, Beijing, China
Xiangyu Xi
Xiangyu Xi
Peking University; Meituan Group
natural language processingevent extractioninformation extractiontask-oriented dialogue
W
Wei Wang
Meituan Group, Beijing, China
Jingang Wang
Jingang Wang
Meituan
Information RetrievalNatural Language ProcessingMachine Translation
X
Xunliang Cai
Meituan Group, Beijing, China
Shikun Zhang
Shikun Zhang
εŒ—δΊ¬ε€§ε­¦
W
Wei Ye
National Engineering Research Center for Software Engineering, Peking University, Beijing, China