Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Current AI text detectors overlook sociolinguistic attributes—such as CEFR proficiency level, native language background, gender, and academic discipline—leading to systematic detection bias and unfair author attribution. Method: Leveraging the ICNALE human corpus and multi-model generated texts, we develop a joint analytical framework combining multifactor ANOVA and weighted least squares (WLS) regression to isolate and quantify the effects of these variables on detector performance. Contribution/Results: We provide the first empirical evidence that CEFR level and linguistic background significantly reduce detection accuracy; gender and disciplinary effects are detector-dependent. These findings challenge the assumption of detector agnosticism toward author demographics and advocate for a socially aware, de-biased paradigm in AI text detection. The study establishes theoretical foundations and actionable guidelines—e.g., fairness-aware evaluation protocols and demographic-informed model calibration—for building equitable, robust, and generalizable detection systems.

Technology Category

Application Category

📝 Abstract

The rise of Large Language Models (LLMs) necessitates accurate AI-generated text detection. However, current approaches largely overlook the influence of author characteristics. We investigate how sociolinguistic attributes-gender, CEFR proficiency, academic field, and language environment-impact state-of-the-art AI text detectors. Using the ICNALE corpus of human-authored texts and parallel AI-generated texts from diverse LLMs, we conduct a rigorous evaluation employing multi-factor ANOVA and weighted least squares (WLS). Our results reveal significant biases: CEFR proficiency and language environment consistently affected detector accuracy, while gender and academic field showed detector-dependent effects. These findings highlight the crucial need for socially aware AI text detection to avoid unfairly penalizing specific demographic groups. We offer novel empirical evidence, a robust statistical framework, and actionable insights for developing more equitable and reliable detection systems in real-world, out-of-domain contexts. This work paves the way for future research on bias mitigation, inclusive evaluation benchmarks, and socially responsible LLM detectors.

Problem

Research questions and friction points this paper is trying to address.

Impact of author roles on AI text detection

Sociolinguistic attributes affect detector accuracy

Need for socially aware AI text detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-factor ANOVA analysis

Weighted least squares method

Sociolinguistic attribute evaluation

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods