π€ AI Summary
This work addresses a key limitation of existing training-free AI-generated text detection methods, which assume uniform contribution from all tokens and thus struggle with short texts or locally manipulated content. To overcome this, the authors propose an exon-aware token reweighting mechanism that identifies discriminative βexonβ tokens by measuring hidden state discrepancies between two off-the-shelf language models. These tokens are then assigned importance weights to construct a weighted sequence for computing an interpretable detection score. By relaxing the uniform-contribution assumption, the method achieves state-of-the-art performance without any training, yielding a relative AUROC improvement of 2.2% on the DetectRL benchmark. Moreover, it demonstrates significantly enhanced robustness against adversarial attacks and variations in text length.
π Abstract
The rapid advancement of large language models has increasingly blurred the boundary between human-written and AI-generated text, raising societal risks such as misinformation dissemination, authorship ambiguity, and threats to intellectual property rights. These concerns highlight the urgent need for effective and reliable detection methods. While existing training-free approaches often achieve strong performance by aggregating token-level signals into a global score, they typically assume uniform token contributions, making them less robust under short sequences or localized token modifications. To address these limitations, we propose Exons-Detect, a training-free method for AI-generated text detection based on an exon-aware token reweighting perspective. Exons-Detect identifies and amplifies informative exonic tokens by measuring hidden-state discrepancy under a dual-model setting, and computes an interpretable translation score from the resulting importance-weighted token sequence. Empirical evaluations demonstrate that Exons-Detect achieves state-of-the-art detection performance and exhibits strong robustness to adversarial attacks and varying input lengths. In particular, it attains a 2.2\% relative improvement in average AUROC over the strongest prior baseline on DetectRL.