ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated formalization suffers from semantic distortion: large language models often generate syntactically correct but semantically inaccurate formal statements, lacking human experts’ reflective reasoning and iterative refinement capabilities. To address this, we propose ReForm, a reflective automated formalization framework featuring a novel generation–evaluation–self-correction loop integrated with prospective bounded sequence optimization (PBSO), sequence-level reinforcement learning reward modeling, and a semantic consistency assessment mechanism. To support training and evaluation, we introduce ConsistencyCheck—the first human-annotated benchmark explicitly designed to measure semantic fidelity in formalization. Experiments show that ReForm achieves an average 17.2-percentage-point improvement over the strongest baselines across four mainstream benchmarks. Moreover, ConsistencyCheck reveals that 38.5% of expert-provided formalizations contain semantic errors, underscoring the task’s inherent difficulty and validating ReForm’s effectiveness and conceptual novelty.

Technology Category

Application Category

📝 Abstract
Autoformalization, which translates natural language mathematics into machine-verifiable formal statements, is critical for using formal mathematical reasoning to solve math problems stated in natural language. While Large Language Models can generate syntactically correct formal statements, they often fail to preserve the original problem's semantic intent. This limitation arises from the LLM approaches'treating autoformalization as a simplistic translation task which lacks mechanisms for self-reflection and iterative refinement that human experts naturally employ. To address these issues, we propose ReForm, a Reflective Autoformalization method that tightly integrates semantic consistency evaluation into the autoformalization process. This enables the model to iteratively generate formal statements, assess its semantic fidelity, and self-correct identified errors through progressive refinement. To effectively train this reflective model, we introduce Prospective Bounded Sequence Optimization (PBSO), which employs different rewards at different sequence positions to ensure that the model develops both accurate autoformalization and correct semantic validations, preventing superficial critiques that would undermine the purpose of reflection. Extensive experiments across four autoformalization benchmarks demonstrate that ReForm achieves an average improvement of 17.2 percentage points over the strongest baselines. To further ensure evaluation reliability, we introduce ConsistencyCheck, a benchmark of 859 expert-annotated items that not only validates LLMs as judges but also reveals that autoformalization is inherently difficult: even human experts produce semantic errors in up to 38.5% of cases.
Problem

Research questions and friction points this paper is trying to address.

Improving semantic fidelity in natural-to-formal mathematics translation
Addressing LLMs' lack of self-reflection during autoformalization
Developing iterative refinement for machine-verifiable formal statements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates semantic evaluation into autoformalization process
Uses iterative generation and self-correction for refinement
Employs position-specific rewards for accurate validation training
🔎 Similar Papers
No similar papers found.
G
Guoxin Chen
Gaoling School of Artificial Intelligence, Renmin University of China
J
Jing Wu
Tongyi Lab, Alibaba Group
X
Xinjie Chen
Tongyi Lab, Alibaba Group
Wayne Xin Zhao
Wayne Xin Zhao
Professor, Renmin University of China
Recommender SystemNatural Language ProcessingLarge Language Model
Ruihua Song
Ruihua Song
Renmin University of China
AI based creationmulti-modaltiy chitchatnatural language understandinginformation retrievalinformation extraction
C
Chengxi Li
Tongyi Lab, Alibaba Group
Kai Fan
Kai Fan
ByteDance
Machine learningBayesian Deep LearningMachine translationLLMs
D
Dayiheng Liu
Tongyi Lab, Alibaba Group
M
Minpeng Liao
Tongyi Lab, Alibaba Group