Can We Still Hear the Accent? Investigating the Resilience of Native Language Signals in the LLM Era

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This study investigates whether the widespread adoption of large language models (LLMs) has led to homogenization in academic writing and diminished linguistic traces of authors’ native languages. Leveraging a temporally stratified dataset from the ACL Anthology—spanning the pre-neural, pre-LLM, and post-LLM eras—the authors construct a labeled corpus through semi-automatic annotation and a fine-tuned LLM-based classifier to systematically assess the erosion of native-language signatures. Results reveal a significant decline over time in the performance of native-language identification, yet with notable cross-linguistic variation: Chinese and French exhibit greater resilience, whereas Japanese and Korean show more pronounced attenuation. These findings underscore the uneven impact of LLMs on academic writing across different languages.

Technology Category

Application Category

📝 Abstract
The evolution of writing assistance tools from machine translation to large language models (LLMs) has changed how researchers write. This study investigates whether this shift is homogenizing research papers by analyzing native language identification (NLI) trends in ACL Anthology papers across three eras: pre-neural network (NN), pre-LLM, and post-LLM. We construct a labeled dataset using a semi-automated framework and fine-tune a classifier to detect linguistic fingerprints of author backgrounds. Our analysis shows a consistent decline in NLI performance over time. Interestingly, the post-LLM era reveals anomalies: while Chinese and French show unexpected resistance or divergent trends, Japanese and Korean exhibit sharper-than-expected declines.
Problem

Research questions and friction points this paper is trying to address.

native language identification
large language models
writing homogenization
linguistic fingerprints
author background
Innovation

Methods, ideas, or system contributions that make the work stand out.

native language identification
large language models
writing homogenization
linguistic fingerprint
ACL Anthology