The Impact of Editorial Intervention on Detecting Native Language Traces

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

162K/year
🤖 AI Summary
This study investigates the impact of editorial interventions on native language identification in human–AI collaborative writing. Drawing on 450 essays from the Write & Improve 2024 corpus, the authors simulate varying degrees of editing intensity through multi-stage grammatical error correction and rewriting, systematically evaluating the robustness of native language identification models on post-edited texts. The work provides the first empirical evidence that native-language traces stem not only from surface-level linguistic errors but also from deeper features such as non-idiomatic lexical choices, pragmatic transfer, and culturally grounded perspectives. Results show that light editing preserves structural L1 characteristics and sustains high identification accuracy, whereas highly fluent rewrites substantially attenuate these features, leading to a marked decline in model performance.
📝 Abstract
Native Language Identification (NLI) is the task of determining an author's native language (L1) from their non-native writings. With the advent of human-AI co-authorship, non-native texts are routinely corrected and rewritten by large language models, fundamentally altering the linguistic features NLI models depend on. In this paper, we investigate the robustness of L1 traces across increasing degrees of editorial intervention. By processing 450 essays from the Write & Improve 2024 corpus through varying levels of grammatical error correction (GEC) and paraphrasing, we demonstrate that L1 attribution does not entirely depend on surface-level errors. Instead, the detection models leverage deeper L1 features: unidiomatic lexico-semantic choices, pragmatic transfer, and the author's underlying cultural perspective. We find that minimal edits preserve these structural traces and maintain high profiling accuracy. In contrast, fluency edits and paraphrasing normalize these L1 features, leading to a severe degradation in performance.
Problem

Research questions and friction points this paper is trying to address.

Native Language Identification
Editorial Intervention
L1 Traces
Human-AI Co-authorship
Grammatical Error Correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Native Language Identification
Editorial Intervention
L1 Transfer
Human-AI Co-authorship
Lexico-semantic Features