Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the poor generalization of AI-generated text detection by reframing human-written text as out-of-distribution (OOD) samples—a paradigm shift from conventional binary classification. Methodologically, we propose an end-to-end OOD detection framework leveraging single-class learning models (e.g., DeepSVDD, HRN) combined with energy-based scoring functions. Evaluated on the DeepFake dataset, our approach achieves 98.3% AUROC and AUPR, with an FPR@95 of only 8.9%. Crucially, it demonstrates strong cross-lingual, adversarial, and cross-model robustness—generalizing effectively to unseen large language models (e.g., GPT-4, Claude, DeepSeek) and multilingual settings. Our core contribution is the first systematic reformulation of AI-text detection as an OOD problem, which significantly enhances generalization to unknown models and domains while preserving detection accuracy.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large language models (LLMs) such as ChatGPT, DeepSeek, and Claude has significantly increased the presence of AI-generated text in digital communication. This trend has heightened the need for reliable detection methods to distinguish between human-authored and machine-generated content. Existing approaches both zero-shot methods and supervised classifiers largely conceptualize this task as a binary classification problem, often leading to poor generalization across domains and models. In this paper, we argue that such a binary formulation fundamentally mischaracterizes the detection task by assuming a coherent representation of human-written texts. In reality, human texts do not constitute a unified distribution, and their diversity cannot be effectively captured through limited sampling. This causes previous classifiers to memorize observed OOD characteristics rather than learn the essence of `non-ID' behavior, limiting generalization to unseen human-authored inputs. Based on this observation, we propose reframing the detection task as an out-of-distribution (OOD) detection problem, treating human-written texts as distributional outliers while machine-generated texts are in-distribution (ID) samples. To this end, we develop a detection framework using one-class learning method including DeepSVDD and HRN, and score-based learning techniques such as energy-based method, enabling robust and generalizable performance. Extensive experiments across multiple datasets validate the effectiveness of our OOD-based approach. Specifically, the OOD-based method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset. Moreover, we test our detection framework on multilingual, attacked, and unseen-model and -domain text settings, demonstrating the robustness and generalizability of our framework. Code, pretrained weights, and demo will be released.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated texts via out-of-distribution detection

Treating human texts as outliers versus machine-generated content

Improving generalization across domains and unseen models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reframing detection as OOD problem

Using one-class learning methods like DeepSVDD

Applying energy-based score learning techniques

🔎 Similar Papers

Learning to Rewrite: Generalized LLM-Generated Text Detection