Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
Existing approaches to automatic formalization often overlook the hierarchical logical structure inherent in mathematical statements. This work proposes the DSR framework, which achieves modular formalization by decomposing statements, constructing operator trees, and iteratively refining and repairing subtrees. It introduces, for the first time, the topological structure of operator trees to guide error localization and correction, and presents PRIME, a high-quality benchmark of formalized theorems. By integrating neural-symbolic systems, large language models, and formal verification, DSR significantly outperforms current methods under the same computational budget, establishing a new state-of-the-art in automatic formalization.

Technology Category

Application Category

📝 Abstract
Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language problems into formal language. While prior works have focused on data synthesis and diverse training paradigms to optimize end-to-end Large Language Models (LLMs), they typically treat formal code as flat sequences, neglecting the hierarchical logic inherent in mathematical statements. In this work, we introduce Decompose, Structure, and Repair (DSR), a neuro-symbolic framework that restructures autoformalization into a modular pipeline. DSR decomposes statements into logical components and maps them to structured operator trees, leveraging this topological blueprint to precisely localize and repair errors via sub-tree refinement. Furthermore, we introduce PRIME, a benchmark of 156 undergraduate and graduate-level theorems selected from canonical textbooks and expertly annotated in Lean 4. Experimental results demonstrate that DSR establishes a new state-of-the-art, consistently outperforming baselines under equivalent computational budgets. The datasets, model, and code will be released to the public soon.
Problem

Research questions and friction points this paper is trying to address.

autoformalization
mathematical statements
hierarchical logic
formal language
natural language
Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic
autoformalization
operator trees
modular pipeline
error repair