Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing large language model watermarking schemes, which rely on secret keys or service-provider-specific detectors and thus hinder independent third-party auditing. To overcome this, the authors propose TTP-Detect, a framework enabling key-free, non-invasive, black-box third-party watermark verification without access to model internals. The method enhances watermark signals using surrogate models and evaluates distribution alignment through multiple relative metrics, effectively decoupling watermark injection from detection. Experimental results demonstrate that TTP-Detect achieves strong detection performance and robustness against attacks across diverse watermarking schemes, datasets, and language models, establishing the first truly decentralized watermark auditing capability.

Technology Category

Application Category

📝 Abstract
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.
Problem

Research questions and friction points this paper is trying to address.

LLM watermarking
black-box detection
third-party verification
non-intrusive auditing
provenance verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

black-box watermark detection
third-party verification
proxy model
relative hypothesis testing
non-intrusive auditing
🔎 Similar Papers
No similar papers found.
Z
Zhuoshang Wang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Y
Yubing Ren
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Yanan Cao
Yanan Cao
Institute of Information Engineering, Chinese Academy of Sciences
Fang Fang
Fang Fang
Professor, School of Psychological and Cognitive Sciences, Peking University
Visual PerceptionAttentionConsciousnessNeuroimaging
X
Xiaoxue Li
National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT/CC)
L
Li Guo
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China