🤖 AI Summary
To address the limitations of large language models (LLMs) in grammatical error correction—including heavy reliance on supervised fine-tuning, constrained reasoning capacity, and opaque, uncontrollable correction processes—this paper proposes a rule-guided reinforcement learning (RL) framework. Our method encodes domain-specific grammatical rules as differentiable constraints and integrates them into the RL training pipeline of an encoder-decoder architecture, optimizing correction policies via policy gradients rather than end-to-end generation. The key contributions are: (i) the first integration of explicit syntactic/grammatical rules with RL for grammatical error correction, enabling controllable inference and interpretable correction logic; and (ii) substantial improvements in recall for Chinese grammatical error correction, achieving state-of-the-art performance across multiple benchmark datasets. These results empirically validate that rule-guided RL effectively unlocks LLMs’ latent deep reasoning capabilities for structured linguistic tasks.
📝 Abstract
Grammatical error correction is a significant task in NLP. Traditional methods based on encoder-decoder models have achieved certain success, but the application of LLMs in this field is still underexplored. Current research predominantly relies on supervised fine-tuning to train LLMs to directly generate the corrected sentence, which limits the model's powerful reasoning ability. To address this limitation, we propose a novel framework based on Rule-Based RL. Through experiments on the Chinese datasets, our Rule-Based RL framework achieves extbf{state-of-the-art }performance, with a notable increase in extbf{recall}. This result clearly highlights the advantages of using RL to steer LLMs, offering a more controllable and reliable paradigm for future development in GEC.