Black-box Detection of LLM-generated Text Using Generalized Jensen-Shannon Divergence

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Detecting LLM-generated text in black-box settings—where the source model is unknown, surrogate models are mismatched, and contrastive generation is costly—remains challenging. Method: We propose SurpMark, a lightweight detection framework that avoids contrastive generation. It dynamically constructs a Markov state transition matrix based on token surprisal and employs the generalized Jensen–Shannon divergence to quantify distributional discrepancies between test texts and human/machine reference corpora. Theoretical analysis establishes the validity of its discretization criterion and the asymptotic normality of its test statistic. Results: Extensive experiments across multiple datasets, source LLMs, and diverse scenarios demonstrate that SurpMark consistently matches or surpasses state-of-the-art baselines. Ablation studies and statistical tests further confirm the efficacy of each component and validate theoretical convergence properties.

Technology Category

Application Category

📝 Abstract

We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark quantizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from historical corpora. We prove a principled discretization criterion and establish the asymptotic normality of the decision statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines; our experiments corroborate the statistic's asymptotic normality, and ablations validate the effectiveness of the proposed discretization.

Problem

Research questions and friction points this paper is trying to address.

Detect machine-generated text under model mismatch constraints

Reduce costly per-input contrastive generation requirements

Quantify surprisal dynamics via generalized Jensen-Shannon divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses token surprisal dynamics for text detection

Quantizes surprisals into interpretable state transitions

Employs generalized Jensen-Shannon divergence for scoring

🔎 Similar Papers

MOSAIC: Multiple Observers Spotting AI Content, a Robust Approach to Machine-Generated Text Detection