Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the limited generalizability of existing AI-generated text detectors, which often rely on artifacts specific to particular language models and fail to generalize to unseen ones. To overcome this, the authors propose a progressive disentanglement framework that, for the first time, integrates semantic-minimal latent encoding, perturbation regularization, and discriminative representation alignment. This approach effectively decouples semantic content from generator-specific artifacts, substantially enhancing cross-model generalization. Evaluated on the MAGE benchmark encompassing 20 large language models, the method achieves up to a 24.2% improvement in accuracy and a 26.2% gain in F1 score over prior approaches. Moreover, its performance consistently improves as the diversity of training generators increases, demonstrating robust scalability and adaptability.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) generate text that increasingly resembles human writing, the subtle cues that distinguish AI-generated content from human-written content become increasingly challenging to capture. Reliance on generator-specific artifacts is inherently unstable, since new models emerge rapidly and reduce the robustness of such shortcuts. This generalizes unseen generators as a central and challenging problem for AI-text detection. To tackle this challenge, we propose a progressively structured framework that disentangles AI-detection semantics from generator-aware artifacts. This is achieved through a compact latent encoding that encourages semantic minimality, followed by perturbation-based regularization to reduce residual entanglement, and finally a discriminative adaptation stage that aligns representations with task objectives. Experiments on MAGE benchmark, covering 20 representative LLMs across 7 categories, demonstrate consistent improvements over state-of-the-art methods, achieving up to 24.2% accuracy gain and 26.2% F1 improvement. Notably, performance continues to improve as the diversity of training generators increases, confirming strong scalability and generalization in open-set scenarios. Our source code will be publicly available at https://github.com/PuXiao06/DRGD.
Problem

Research questions and friction points this paper is trying to address.

AI-text detection
generalizable detection
unseen generators
disentangled representation
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled representation
generalizable AI-text detection
generator-agnostic
perturbation-based regularization
latent encoding
🔎 Similar Papers
2024-06-21Journal of Artificial Intelligence ResearchCitations: 6