RE$^2$: Improving Chinese Grammatical Error Correction via Retrieving Appropriate Examples with Explanation

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Chinese Grammatical Error Correction (CGEC) approaches predominantly rely on text-similarity–based retrieval of reference sentences, leading to spurious matches—i.e., lexically similar but syntactically irrelevant examples—that hinder precise error pattern identification. Method: We propose an error-explanation–driven exemplar retrieval paradigm: (1) constructing the first high-quality Chinese dataset annotated with fine-grained grammatical error explanations; (2) designing a dual-alignment retrieval mechanism grounded in both semantic similarity and error-type consistency; and (3) integrating error explanations into prompt learning for large language models. Contribution/Results: Our method significantly enhances model understanding and correction of grammatical errors, achieving absolute F₀.₅ improvements of +2.3 and +1.9 on SIGHAN15 and SIGHAN14, respectively—surpassing current state-of-the-art methods. To foster reproducibility and community advancement, we publicly release our code, dataset, and prompt templates, establishing a new benchmark and resource foundation for CGEC research.

Technology Category

Application Category

📝 Abstract
The primary objective of Chinese grammatical error correction (CGEC) is to detect and correct errors in Chinese sentences. Recent research shows that large language models (LLMs) have been applied to CGEC with significant results. For LLMs, selecting appropriate reference examples can help improve their performance. However, existing methods predominantly rely on text similarity for example retrieval, a strategy that frequently mismatches actual error patterns and retrieves lexically similar yet grammatically irrelevant sentences. To address this problem, we propose a method named RE$^2$, which retrieves appropriate examples with explanations of grammatical errors. Instead of using text similarity of the input sentence, we use explanations of grammatical errors to select reference examples, which are used by LLMs to improve the performance of CGEC. We conduct experiments on two CGEC datasets and create a high-quality grammatical error explanation (GEE) dataset, which is not only used in our research but also serves as a valuable resource for future studies in both CGEC and GEE. The experimental results on the two datasets indicate that our proposed method effectively improves the performance of CGEC.
Problem

Research questions and friction points this paper is trying to address.

Improving Chinese grammatical error correction by retrieving relevant examples
Addressing limitations of text similarity-based example retrieval methods
Using grammatical error explanations to select appropriate reference examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses grammatical error explanations for example retrieval
Creates high-quality grammatical error explanation dataset
Improves Chinese grammatical error correction performance
🔎 Similar Papers
No similar papers found.
Baoxin Wang
Baoxin Wang
iFLYTEK Research
Large Language ModelsGrammatical Error CorrectionNatural Language Processing
Y
Yumeng Luo
Artificial Intelligence and Human Language Lab, Beijing Foreign Studies University, Beijing 100089, China
Y
Yixuan Wang
Research Center for SCIR, Harbin Institute of Technology, Harbin 150001, China
D
Dayong Wu
State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, Hefei 230088, China
Wanxiang Che
Wanxiang Che
Professor of Harbin Institute of Technology
Natural Language Processing
Shijin Wang
Shijin Wang
Tongji University
Schedulingmaintenance