🤖 AI Summary
Current AI text detectors overlook sociolinguistic attributes—such as CEFR proficiency level, native language background, gender, and academic discipline—leading to systematic detection bias and unfair author attribution. Method: Leveraging the ICNALE human corpus and multi-model generated texts, we develop a joint analytical framework combining multifactor ANOVA and weighted least squares (WLS) regression to isolate and quantify the effects of these variables on detector performance. Contribution/Results: We provide the first empirical evidence that CEFR level and linguistic background significantly reduce detection accuracy; gender and disciplinary effects are detector-dependent. These findings challenge the assumption of detector agnosticism toward author demographics and advocate for a socially aware, de-biased paradigm in AI text detection. The study establishes theoretical foundations and actionable guidelines—e.g., fairness-aware evaluation protocols and demographic-informed model calibration—for building equitable, robust, and generalizable detection systems.
📝 Abstract
The rise of Large Language Models (LLMs) necessitates accurate AI-generated text detection. However, current approaches largely overlook the influence of author characteristics. We investigate how sociolinguistic attributes-gender, CEFR proficiency, academic field, and language environment-impact state-of-the-art AI text detectors. Using the ICNALE corpus of human-authored texts and parallel AI-generated texts from diverse LLMs, we conduct a rigorous evaluation employing multi-factor ANOVA and weighted least squares (WLS). Our results reveal significant biases: CEFR proficiency and language environment consistently affected detector accuracy, while gender and academic field showed detector-dependent effects. These findings highlight the crucial need for socially aware AI text detection to avoid unfairly penalizing specific demographic groups. We offer novel empirical evidence, a robust statistical framework, and actionable insights for developing more equitable and reliable detection systems in real-world, out-of-domain contexts. This work paves the way for future research on bias mitigation, inclusive evaluation benchmarks, and socially responsible LLM detectors.