Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the limited generalizability of existing AI-generated text detectors, which often rely on artifacts specific to particular language models and fail to generalize to unseen ones. To overcome this, the authors propose a progressive disentanglement framework that, for the first time, integrates semantic-minimal latent encoding, perturbation regularization, and discriminative representation alignment. This approach effectively decouples semantic content from generator-specific artifacts, substantially enhancing cross-model generalization. Evaluated on the MAGE benchmark encompassing 20 large language models, the method achieves up to a 24.2% improvement in accuracy and a 26.2% gain in F1 score over prior approaches. Moreover, its performance consistently improves as the diversity of training generators increases, demonstrating robust scalability and adaptability.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) generate text that increasingly resembles human writing, the subtle cues that distinguish AI-generated content from human-written content become increasingly challenging to capture. Reliance on generator-specific artifacts is inherently unstable, since new models emerge rapidly and reduce the robustness of such shortcuts. This generalizes unseen generators as a central and challenging problem for AI-text detection. To tackle this challenge, we propose a progressively structured framework that disentangles AI-detection semantics from generator-aware artifacts. This is achieved through a compact latent encoding that encourages semantic minimality, followed by perturbation-based regularization to reduce residual entanglement, and finally a discriminative adaptation stage that aligns representations with task objectives. Experiments on MAGE benchmark, covering 20 representative LLMs across 7 categories, demonstrate consistent improvements over state-of-the-art methods, achieving up to 24.2% accuracy gain and 26.2% F1 improvement. Notably, performance continues to improve as the diversity of training generators increases, confirming strong scalability and generalization in open-set scenarios. Our source code will be publicly available at https://github.com/PuXiao06/DRGD.

Problem

Research questions and friction points this paper is trying to address.

AI-text detection

generalizable detection

unseen generators

disentangled representation

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled representation

generalizable AI-text detection

generator-agnostic