SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the problem of efficient, training-free detection of large language model (LLM)-generated text. Unlike existing approaches relying on superficial statistical features, we propose a novel signal-processing paradigm: modeling token-level log-probability sequences as generative signals and analyzing their spectral characteristics via global discrete Fourier transform (DFT) and local short-time Fourier transform (STFT). We discover that human-written text exhibits higher total energy in the frequency domain. Leveraging this insight, we design a single, robust feature—DFT total energy—and enhance its reliability through sampling-difference augmentation. The method requires no training, imposes minimal computational overhead, and achieves state-of-the-art performance across multiple benchmarks: it attains higher detection accuracy while reducing inference latency by nearly 50%. Our approach thus delivers high-precision, low-cost, plug-and-play text provenance identification.

Technology Category

Application Category

📝 Abstract

The proliferation of high-quality text from Large Language Models (LLMs) demands reliable and efficient detection methods. While existing training-free approaches show promise, they often rely on surface-level statistics and overlook fundamental signal properties of the text generation process. In this work, we reframe detection as a signal processing problem, introducing a novel paradigm that analyzes the sequence of token log-probabilities in the frequency domain. By systematically analyzing the signal's spectral properties using the global Discrete Fourier Transform (DFT) and the local Short-Time Fourier Transform (STFT), we find that human-written text consistently exhibits significantly higher spectral energy. This higher energy reflects the larger-amplitude fluctuations inherent in human writing compared to the suppressed dynamics of LLM-generated text. Based on this key insight, we construct SpecDetect, a detector built on a single, robust feature from the global DFT: DFT total energy. We also propose an enhanced version, SpecDetect++, which incorporates a sampling discrepancy mechanism to further boost robustness. Extensive experiments demonstrate that our approach outperforms the state-of-the-art model while running in nearly half the time. Our work introduces a new, efficient, and interpretable pathway for LLM-generated text detection, showing that classical signal processing techniques offer a surprisingly powerful solution to this modern challenge.

Problem

Research questions and friction points this paper is trying to address.

Detect LLM-generated text via spectral analysis

Analyze token log-probabilities in frequency domain

Differentiate human-written text by higher spectral energy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral analysis of token log-probabilities

Global DFT and local STFT for detection

DFT total energy as robust feature

🔎 Similar Papers

Learning to Rewrite: Generalized LLM-Generated Text Detection