🤖 AI Summary
This work addresses the issue of over-correction in grammatical error correction by large language models, which often undermines accuracy. The authors propose a training-free inference method that, for the first time, applies majority voting at the edit level across multiple candidates generated by a single model. This approach requires no model modification or additional training and is evaluated through a comparative analysis combining greedy decoding and Minimum Bayes Risk (MBR) decoding. Experiments on nine benchmark datasets spanning seven languages demonstrate that the method consistently and significantly outperforms existing decoding strategies while exhibiting robustness across diverse instruction prompts.
📝 Abstract
Grammatical error correction using large language models often suffers from the over-correction issue. To mitigate this, we propose a training-free inference method that performs edit-level majority voting over multiple candidates generated by a single model, without requiring model modifications or additional training. Across nine benchmarks covering English, Czech, German, Ukrainian, Korean, Hindi, and Romanian, the proposed method outperforms both greedy and MBR decoding in most cases. Moreover, it yields stable correction quality regardless of the instruction prompts used. We release two repository supporting GEC datasets loading and LLM inference.