Reliability Crisis of Reference-free Metrics for Grammatical Error Correction

πŸ“… 2025-09-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work exposes the severe vulnerability of reference-free grammatical error correction (GEC) evaluation metrics under adversarial settings: although current metrics correlate with human judgments, they are easily misled by deliberately optimized, low-quality corrections, undermining their reliability for automated assessment. To address this, we propose the first systematic framework for adversarial attacks targeting four major classes of reference-free metricsβ€”SOME, Scribendi, IMPARA, and LLM-based metrics. Experimental results demonstrate that the generated adversarial corrections achieve statistically significant improvements over state-of-the-art GEC systems across multiple metrics, despite severe degradation in both grammaticality and semantic fidelity. Our study empirically reveals a fundamental flaw in the reference-free evaluation paradigm and provides a reproducible benchmark to foster the development of robust, trustworthy GEC evaluation methodologies.

Technology Category

Application Category

πŸ“ Abstract
Reference-free evaluation metrics for grammatical error correction (GEC) have achieved high correlation with human judgments. However, these metrics are not designed to evaluate adversarial systems that aim to obtain unjustifiably high scores. The existence of such systems undermines the reliability of automatic evaluation, as it can mislead users in selecting appropriate GEC systems. In this study, we propose adversarial attack strategies for four reference-free metrics: SOME, Scribendi, IMPARA, and LLM-based metrics, and demonstrate that our adversarial systems outperform the current state-of-the-art. These findings highlight the need for more robust evaluation methods.
Problem

Research questions and friction points this paper is trying to address.

Reference-free GEC metrics lack robustness against adversarial attacks
Adversarial systems can exploit metrics to achieve inflated scores
Current evaluation methods mislead users in selecting GEC systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial attack strategies targeting four reference-free metrics
Demonstrating adversarial systems outperform current state-of-the-art
Highlighting need for robust grammatical error evaluation methods
πŸ”Ž Similar Papers
No similar papers found.