Multilingual Grammatical Error Annotation: Combining Language-Agnostic Framework with Language-Specific Flexibility

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing grammatical error annotation frameworks (e.g., ERRANT) struggle to balance cross-linguistic consistency with language-specific expressivity, particularly for morphologically rich and non-Indo-European languages. To address this, we propose the first standardized, modular multilingual grammatical error annotation framework, featuring a two-tier architecture: a language-agnostic base layer augmented by pluggable language-specific extensions. Built upon Stanza, our reimplementation of ERRANT integrates dependency parsing, lemmatization, and linguistically motivated rule templates, enabling multi-level error categorization and configurable annotation pipelines. We validate the framework on five typologically diverse languages—English, German, Czech, Korean, and Chinese—achieving high-quality, consistent annotations. This significantly enhances comparability and interpretability in Grammatical Error Correction (GEC) system evaluation. The complete toolchain is open-sourced, advancing standardization in multilingual GEC benchmarking.

Technology Category

Application Category

📝 Abstract
Grammatical Error Correction (GEC) relies on accurate error annotation and evaluation, yet existing frameworks, such as $ exttt{errant}$, face limitations when extended to typologically diverse languages. In this paper, we introduce a standardized, modular framework for multilingual grammatical error annotation. Our approach combines a language-agnostic foundation with structured language-specific extensions, enabling both consistency and flexibility across languages. We reimplement $ exttt{errant}$ using $ exttt{stanza}$ to support broader multilingual coverage, and demonstrate the framework's adaptability through applications to English, German, Czech, Korean, and Chinese, ranging from general-purpose annotation to more customized linguistic refinements. This work supports scalable and interpretable GEC annotation across languages and promotes more consistent evaluation in multilingual settings. The complete codebase and annotation tools can be accessed at https://github.com/open-writing-evaluation/jp_errant_bea.
Problem

Research questions and friction points this paper is trying to address.

Limitations of existing frameworks for multilingual grammatical error annotation
Need for standardized, modular framework combining consistency and flexibility
Supporting scalable, interpretable error correction across diverse languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized modular framework for multilingual annotation
Combines language-agnostic base with language-specific extensions
Uses stanza for broader multilingual coverage
🔎 Similar Papers
No similar papers found.
M
Mengyang Qiu
Trent University, Canada
T
Tran Minh Nguyen
Open Writing Evaluation, France
Z
Zihao Huang
Open Writing Evaluation, France
Zelong Li
Zelong Li
Rutgers University
Automated Machine LearningRecommendation SystemReinforcement LearningExplainable AI
Y
Yang Gu
Open Writing Evaluation, France
Q
Qingyu Gao
Open Writing Evaluation, France
S
Siliang Liu
Open Writing Evaluation, France
J
Jungyeul Park
Open Writing Evaluation, France; University College London, UK