CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

📅 2024-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing grammatical error correction (GEC) evaluation metrics suffer from poor interpretability, hindering precise localization of system deficiencies. To address this, we propose CLEME2.0—a reference-based, interpretable metric that, for the first time, decouples GEC evaluation into four semantically explicit and attributable edit categories: correct corrections, erroneous corrections, missed corrections, and over-corrections. Leveraging rule-guided fine-grained alignment and a statistical assessment framework, CLEME2.0 models edit classification grounded in reference corrections. Evaluated on two human-annotated datasets and six benchmark datasets, it achieves state-of-the-art performance, significantly improving correlation with human judgments. CLEME2.0 not only surpasses all existing reference-based and reference-free metrics in overall effectiveness—particularly excelling in correlation-based evaluation—but also enables system-level diagnostic analysis and targeted model improvement.

Technology Category

Application Category

📝 Abstract
The paper focuses on the interpretability of Grammatical Error Correction (GEC) evaluation metrics, which received little attention in previous studies. To bridge the gap, we introduce **CLEME2.0**, a reference-based metric describing four fundamental aspects of GEC systems: hit-correction, wrong-correction, under-correction, and over-correction. They collectively contribute to exposing critical qualities and locating drawbacks of GEC systems. Evaluating systems by combining these aspects also leads to superior human consistency over other reference-based and reference-less metrics. Extensive experiments on two human judgment datasets and six reference datasets demonstrate the effectiveness and robustness of our method, achieving a new state-of-the-art result. Our codes are released at https://github.com/THUKElab/CLEME.
Problem

Research questions and friction points this paper is trying to address.

Improving interpretability of GEC evaluation metrics
Disentangling GEC system edits into four aspects
Enhancing human consistency in GEC evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles GEC edits into four aspects
Enhances interpretability of GEC evaluation
Achieves state-of-the-art human consistency
🔎 Similar Papers
No similar papers found.
J
Jingheng Ye
Tsinghua University
Zishan Xu
Zishan Xu
Tsinghua University
Y
Yinghui Li
Tsinghua University
Xuxin Cheng
Xuxin Cheng
University of California, San Diego
L
Linlin Song
Huazhong University of Science and Technology
Q
Qingyu Zhou
H
Hai-Tao Zheng
Tsinghua University, Peng Cheng Laboratory
Y
Ying Shen
Sun-Yat Sen University
X
Xin Su
Tencent