🤖 AI Summary
This work addresses the limitation of existing large language model watermarking schemes, which rely on secret keys or service-provider-specific detectors and thus hinder independent third-party auditing. To overcome this, the authors propose TTP-Detect, a framework enabling key-free, non-invasive, black-box third-party watermark verification without access to model internals. The method enhances watermark signals using surrogate models and evaluates distribution alignment through multiple relative metrics, effectively decoupling watermark injection from detection. Experimental results demonstrate that TTP-Detect achieves strong detection performance and robustness against attacks across diverse watermarking schemes, datasets, and language models, establishing the first truly decentralized watermark auditing capability.
📝 Abstract
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.