SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of efficient, training-free detection of large language model (LLM)-generated text. Unlike existing approaches relying on superficial statistical features, we propose a novel signal-processing paradigm: modeling token-level log-probability sequences as generative signals and analyzing their spectral characteristics via global discrete Fourier transform (DFT) and local short-time Fourier transform (STFT). We discover that human-written text exhibits higher total energy in the frequency domain. Leveraging this insight, we design a single, robust feature—DFT total energy—and enhance its reliability through sampling-difference augmentation. The method requires no training, imposes minimal computational overhead, and achieves state-of-the-art performance across multiple benchmarks: it attains higher detection accuracy while reducing inference latency by nearly 50%. Our approach thus delivers high-precision, low-cost, plug-and-play text provenance identification.

Technology Category

Application Category

📝 Abstract
The proliferation of high-quality text from Large Language Models (LLMs) demands reliable and efficient detection methods. While existing training-free approaches show promise, they often rely on surface-level statistics and overlook fundamental signal properties of the text generation process. In this work, we reframe detection as a signal processing problem, introducing a novel paradigm that analyzes the sequence of token log-probabilities in the frequency domain. By systematically analyzing the signal's spectral properties using the global Discrete Fourier Transform (DFT) and the local Short-Time Fourier Transform (STFT), we find that human-written text consistently exhibits significantly higher spectral energy. This higher energy reflects the larger-amplitude fluctuations inherent in human writing compared to the suppressed dynamics of LLM-generated text. Based on this key insight, we construct SpecDetect, a detector built on a single, robust feature from the global DFT: DFT total energy. We also propose an enhanced version, SpecDetect++, which incorporates a sampling discrepancy mechanism to further boost robustness. Extensive experiments demonstrate that our approach outperforms the state-of-the-art model while running in nearly half the time. Our work introduces a new, efficient, and interpretable pathway for LLM-generated text detection, showing that classical signal processing techniques offer a surprisingly powerful solution to this modern challenge.
Problem

Research questions and friction points this paper is trying to address.

Detect LLM-generated text via spectral analysis
Analyze token log-probabilities in frequency domain
Differentiate human-written text by higher spectral energy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral analysis of token log-probabilities
Global DFT and local STFT for detection
DFT total energy as robust feature
🔎 Similar Papers
No similar papers found.
H
Haitong Luo
Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences
W
Weiyao Zhang
Institute of Computing Technology, Chinese Academy of Sciences
Suhang Wang
Suhang Wang
Pennsylvania State University
Data miningMachine learningDeep LearningGraph Mining
W
Wenji Zou
Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences
C
Chungang Lin
Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences
Xuying Meng
Xuying Meng
Institute of Computing Technology, Chinese Academy of Sciences
Y
Yujun Zhang
Institute of Computing Technology, Chinese Academy of Sciences