EXCGEC: A Benchmark for Edit-Wise Explainable Chinese Grammatical Error Correction

πŸ“… 2024-07-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing interpretability research in grammatical error correction (GEC) overlooks joint modeling of correction and explanation, and lacks a comprehensive Chinese evaluation benchmark. Method: We propose explanatory GEC (EXGEC), a novel task emphasizing co-modeling of correction and explanation, and introduce EXCGECβ€”the first Chinese edit-level interpretable GEC benchmark, comprising 8,216 samples annotated with hybrid edit explanations (operation type, position, and rationale). We design a multi-task learning framework supporting both pre- and post-hoc explanation generation, adopt METEOR and ROUGE for free-text explanation evaluation, and conduct human evaluation for validation. Results: Experiments reveal strong alignment between automatic metrics and human judgment; however, the multi-task model underperforms the pipeline approach, highlighting key challenges in joint modeling. This work provides three foundational contributions: a formal task definition for EXGEC, the EXCGEC benchmark resource, and a standardized evaluation paradigm for interpretable GEC.

Technology Category

Application Category

πŸ“ Abstract
Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations and have not established a corresponding comprehensive benchmark. To bridge the gap, this paper first introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for Chinese EXGEC consisting of 8,216 explanation-augmented samples featuring the design of hybrid edit-wise explanations. We then benchmark several series of LLMs in multi-task learning settings, including post-explaining and pre-explaining. To promote the development of the task, we also build a comprehensive evaluation suite by leveraging existing automatic metrics and conducting human evaluation experiments to demonstrate the human consistency of the automatic metrics for free-text explanations. Our experiments reveal the effectiveness of evaluating free-text explanations using traditional metrics like METEOR and ROUGE, and the inferior performance of multi-task models compared to the pipeline solution, indicating its challenges to establish positive effects in learning both tasks.
Problem

Research questions and friction points this paper is trying to address.

Develop EXCGEC benchmark for Chinese EXGEC
Evaluate LLMs in multi-task learning settings
Assess free-text explanations using traditional metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces EXGEC task
Creates EXCGEC benchmark
Evaluates multi-task LLMs
πŸ”Ž Similar Papers
No similar papers found.
J
Jingheng Ye
Tsinghua University
S
Shang Qin
Tsinghua University
Y
Yinghui Li
Tsinghua University
Xuxin Cheng
Xuxin Cheng
University of California, San Diego
L
Libo Qin
Central South University
H
Hai-Tao Zheng
Tsinghua University, Peng Cheng Laboratory
Y
Ying Shen
P
Peng Xing
Tsinghua University
Zishan Xu
Zishan Xu
Tsinghua University
G
Guo Cheng
Tsinghua University
Wenhao Jiang
Wenhao Jiang
GML, Tencent, PolyU
Computer VisionMachine LearningFoundation Models