🤖 AI Summary
This work addresses the performance bottleneck of current training-free AI-generated text detection methods, which rely on model perplexity and are constrained by the human-like probability distributions of outputs from RLHF-optimized language models. The authors propose a novel detection signal based on character-level statistical features, revealing a pronounced divergence—termed the “wall of separation”—between human and AI texts in letter distribution patterns. Leveraging this insight, they introduce the Letter Distribution Score (LD-Score) as a perplexity-independent metric. Additionally, they construct MDTA, a large-scale, comprehensive evaluation benchmark encompassing multiple models, domains, temperature settings, and adversarial samples. Experiments demonstrate that LD-Score exhibits low correlation with existing methods and significantly improves AUROC and F1 scores in specialized domains; when fused with nonlinear classifiers, it yields consistent performance gains across the MDTA benchmark.
📝 Abstract
Training-free AI text detection methods primarily rely on model log-probabilities, achieving strong performance through approaches like Binoculars and DNA-DetectLLM. However, these methods face a fundamental ceiling as models are optimized through RLHF to produce human-like probability distributions. We introduce an alternative detection signal based on character distribution signatures. We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a "Wall of Separation" where human-AI divergence significantly exceeds AI-AI divergence. To enable systematic evaluation, we construct the Models-Domains-Temperatures-Adversarials (MDTA) benchmark comprising 642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperature settings, and 3 adversarial strategies, substantially expanding the HC3 dataset with modern model responses, temperature variation, and adversarial augmentation. We introduce the Letter Distribution Score (LD-Score), demonstrating low correlation (r = 0.08-0.13) with perplexity methods. When integrated with DNA-DetectLLM, Binoculars and FastDetectGPT via a non-linear classifier, LD-Score yields consistent improvements in AUROC and F1, with particularly pronounced gains in specialized domains where vocabulary constraints amplify the detection signal. The MDTA dataset can be accessed at: https://huggingface.co/datasets/nsp909/MDTA.