CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

📅 2024-07-01

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing grammatical error correction (GEC) evaluation metrics suffer from poor interpretability, hindering precise localization of system deficiencies. To address this, we propose CLEME2.0—a reference-based, interpretable metric that, for the first time, decouples GEC evaluation into four semantically explicit and attributable edit categories: correct corrections, erroneous corrections, missed corrections, and over-corrections. Leveraging rule-guided fine-grained alignment and a statistical assessment framework, CLEME2.0 models edit classification grounded in reference corrections. Evaluated on two human-annotated datasets and six benchmark datasets, it achieves state-of-the-art performance, significantly improving correlation with human judgments. CLEME2.0 not only surpasses all existing reference-based and reference-free metrics in overall effectiveness—particularly excelling in correlation-based evaluation—but also enables system-level diagnostic analysis and targeted model improvement.

Technology Category

Application Category

📝 Abstract

The paper focuses on the interpretability of Grammatical Error Correction (GEC) evaluation metrics, which received little attention in previous studies. To bridge the gap, we introduce **CLEME2.0**, a reference-based metric describing four fundamental aspects of GEC systems: hit-correction, wrong-correction, under-correction, and over-correction. They collectively contribute to exposing critical qualities and locating drawbacks of GEC systems. Evaluating systems by combining these aspects also leads to superior human consistency over other reference-based and reference-less metrics. Extensive experiments on two human judgment datasets and six reference datasets demonstrate the effectiveness and robustness of our method, achieving a new state-of-the-art result. Our codes are released at https://github.com/THUKElab/CLEME.

Problem

Research questions and friction points this paper is trying to address.

Improving interpretability of GEC evaluation metrics

Disentangling GEC system edits into four aspects

Enhancing human consistency in GEC evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles GEC edits into four aspects

Enhances interpretability of GEC evaluation

Achieves state-of-the-art human consistency

🔎 Similar Papers

EXCGEC: A Benchmark for Edit-Wise Explainable Chinese Grammatical Error Correction