Publicly Detectable Watermarking for Language Models

📅 2023-10-27
🏛️ IACR Cryptology ePrint Archive
📈 Citations: 38
Influential: 7
📄 PDF
🤖 AI Summary
Existing watermarking schemes for large language models (LLMs) suffer from susceptibility to forgery, low detection reliability, and reliance on secret keys for verification. Method: We propose the first publicly verifiable, zero-distortion, tamper-resistant watermarking scheme for LLMs. Instead of relying on cryptographic keys, our method embeds cryptographic signatures directly into token probability distributions via rejection sampling and integrates LDPC error-correcting codes to enhance robustness in low-entropy contexts. We provide formal proofs of unforgeability and strict zero textual distortion. Results: Experiments demonstrate >99% watermark detection accuracy, strong resilience against editing, translation, rewriting, and other adversarial attacks, and consistent performance across diverse prompts. All theoretical claims—including public verifiability, zero distortion, and robustness—are empirically validated. This work overcomes fundamental security and practicality bottlenecks hindering publicly verifiable watermarking in LLM applications.
📝 Abstract
We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.
Problem

Research questions and friction points this paper is trying to address.

Watermarking
Text Authentication
Error Correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transparent watermarking
Cryptography signature
Error correction
🔎 Similar Papers
2024-06-17North American Chapter of the Association for Computational LinguisticsCitations: 2