Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the limitation of existing large language model watermarking schemes, which rely on secret keys or service-provider-specific detectors and thus hinder independent third-party auditing. To overcome this, the authors propose TTP-Detect, a framework enabling key-free, non-invasive, black-box third-party watermark verification without access to model internals. The method enhances watermark signals using surrogate models and evaluates distribution alignment through multiple relative metrics, effectively decoupling watermark injection from detection. Experimental results demonstrate that TTP-Detect achieves strong detection performance and robustness against attacks across diverse watermarking schemes, datasets, and language models, establishing the first truly decentralized watermark auditing capability.

Technology Category

Application Category

📝 Abstract

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

Problem

Research questions and friction points this paper is trying to address.

LLM watermarking

black-box detection

third-party verification

non-intrusive auditing

provenance verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

black-box watermark detection

third-party verification

proxy model