Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection

📅 2024-12-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing white-box methods for zero-shot LLM-generated text detection fail on closed-source, state-of-the-art LLMs due to lack of model access, while black-box approaches suffer from severely limited observability. Method: We propose Glimpse—a lightweight distribution reconstruction mechanism that extrapolates the full output token distribution solely from partial API-provided probability information (e.g., top-k logits or sampled tokens), requiring no model access, fine-tuning, or gradient computation. Contribution/Results: Glimpse enables, for the first time, zero-shot transfer of white-box metrics—including Entropy and Fast-DetectGPT—to closed-source models like GPT-3.5. Evaluated on five contemporary closed-source LLMs, Glimpse combined with Fast-DetectGPT achieves an average AUROC of 0.95—outperforming open-source baselines by 51% relative to the remaining discriminative margin—demonstrating that powerful LLMs can efficiently detect their own generated text.

Technology Category

Application Category

📝 Abstract

Advanced large language models (LLMs) can generate text almost indistinguishable from human-written text, highlighting the importance of LLM-generated text detection. However, current zero-shot techniques face challenges as white-box methods are restricted to use weaker open-source LLMs, and black-box methods are limited by partial observation from stronger proprietary LLMs. It seems impossible to enable white-box methods to use proprietary models because API-level access to the models neither provides full predictive distributions nor inner embeddings. To traverse the divide, we propose **Glimpse**, a probability distribution estimation approach, predicting the full distributions from partial observations. Despite the simplicity of Glimpse, we successfully extend white-box methods like Entropy, Rank, Log-Rank, and Fast-DetectGPT to latest proprietary models. Experiments show that Glimpse with Fast-DetectGPT and GPT-3.5 achieves an average AUROC of about 0.95 in five latest source models, improving the score by 51% relative to the remaining space of the open source baseline. It demonstrates that the latest LLMs can effectively detect their own outputs, suggesting that advanced LLMs may be the best shield against themselves. We release our code and data at https://github.com/baoguangsheng/glimpse.

Problem

Research questions and friction points this paper is trying to address.

Enables white-box methods to use proprietary LLMs.

Estimates full probability distributions from partial observations.

Improves detection of zero-shot LLM-generated texts.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates full probability distributions

Extends white-box methods

Uses proprietary LLMs effectively

🔎 Similar Papers

Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore