🤖 AI Summary
Current AI-generated text detectors aggregate token-level features into scalar scores, discarding spatial information about anomalous positions and thus exhibiting poor robustness against local adversarial perturbations. We identify that AI-generated texts exhibit pronounced non-stationarity—inter-paragraph statistical divergence is 73.8% higher than in human-written texts—revealing the root cause of this fragility. To address this, we propose Temporal Difference Tomography (TDT), which reformulates detection as a signal processing task: it applies continuous wavelet transform to token-level discrepancy sequences, yielding a two-dimensional time-scale representation that preserves both positional structure and multi-scale linguistic anomalies. Evaluated on the RAID benchmark, TDT achieves an AUROC of 0.855—outperforming the best baseline by 7.1%. Against HART Level 2 rewriting attacks, it improves detection accuracy by 14.1%, with only a 13% increase in computational overhead.
📝 Abstract
The field of AI-generated text detection has evolved from supervised classification to zero-shot statistical analysis. However, current approaches share a fundamental limitation: they aggregate token-level measurements into scalar scores, discarding positional information about where anomalies occur. Our empirical analysis reveals that AI-generated text exhibits significant non-stationarity, statistical properties vary by 73.8% more between text segments compared to human writing. This discovery explains why existing detectors fail against localized adversarial perturbations that exploit this overlooked characteristic. We introduce Temporal Discrepancy Tomography (TDT), a novel detection paradigm that preserves positional information by reformulating detection as a signal processing task. TDT treats token-level discrepancies as a time-series signal and applies Continuous Wavelet Transform to generate a two-dimensional time-scale representation, capturing both the location and linguistic scale of statistical anomalies. On the RAID benchmark, TDT achieves 0.855 AUROC (7.1% improvement over the best baseline). More importantly, TDT demonstrates robust performance on adversarial tasks, with 14.1% AUROC improvement on HART Level 2 paraphrasing attacks. Despite its sophisticated analysis, TDT maintains practical efficiency with only 13% computational overhead. Our work establishes non-stationarity as a fundamental characteristic of AI-generated text and demonstrates that preserving temporal dynamics is essential for robust detection.