Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated formalization of natural language statements lacks reliable, efficient, and semantically sensitive evaluation metrics. To address this, we propose GTED—a novel framework that (1) standardizes formalized statements, (2) constructs operator trees capturing logical structure, and (3) computes Generalized Tree Edit Distance via dynamic programming to quantify semantic similarity. GTED is the first structure-aware tree edit distance metric tailored for formalization quality assessment, overcoming key limitations of prior approaches: weak semantic understanding, high computational overhead, and dependence on theorem provers. Evaluated on the miniF2F and ProofNet benchmarks, GTED consistently outperforms all baseline metrics—achieving state-of-the-art accuracy and Cohen’s Kappa, with substantial gains in evaluation fidelity and practical utility.

Technology Category

Application Category

📝 Abstract
Statement autoformalization, the automated translation of statement from natural language into formal languages, has become a subject of extensive research, yet the development of robust automated evaluation metrics remains limited. Existing evaluation methods often lack semantic understanding, face challenges with high computational costs, and are constrained by the current progress of automated theorem proving. To address these issues, we propose GTED (Generalized Tree Edit Distance), a novel evaluation framework that first standardizes formal statements and converts them into operator trees, then determines the semantic similarity using the eponymous GTED metric. On the miniF2F and ProofNet benchmarks, GTED outperforms all baseline metrics by achieving the highest accuracy and Kappa scores, thus providing the community with a more faithful metric for automated evaluation. The code and experimental results are available at https://github.com/XiaoyangLiu-sjtu/GTED.
Problem

Research questions and friction points this paper is trying to address.

Lack of robust metrics for statement autoformalization evaluation
High computational costs in existing evaluation methods
Limited semantic understanding in current evaluation approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardizes formal statements into operator trees
Uses GTED metric for semantic similarity
Achieves highest accuracy on benchmarks
🔎 Similar Papers
No similar papers found.
Y
Yuntian Liu
School of Mathematical Sciences, Shanghai Jiao Tong University
T
Tao Zhu
School of Mathematical Sciences, Shanghai Jiao Tong University
X
Xiaoyang Liu
School of Mathematical Sciences, Shanghai Jiao Tong University
Y
Yu Chen
School of Mathematical Sciences, Shanghai Jiao Tong University
Z
Zhaoxuan Liu
School of Mathematical Sciences, Shanghai Jiao Tong University
Q
Qingfeng Guo
School of Mathematical Sciences, Shanghai Jiao Tong University
Jiashuo Zhang
Jiashuo Zhang
Peking University
Software EngeneeringLLM4SESmart Contract
K
Kangjie Bao
School of Mathematical Sciences, Shanghai Jiao Tong University
T
Tao Luo
1School of Mathematical Sciences, Shanghai Jiao Tong University 2Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University 3CMA-Shanghai, Shanghai Artificial Intelligence Laboratory