Publicly Detectable Watermarking for Language Models

📅 2023-10-27

🏛️ IACR Cryptology ePrint Archive

📈 Citations: 38

✨ Influential: 7

career value

212K/year

🤖 AI Summary

Existing watermarking schemes for large language models (LLMs) suffer from susceptibility to forgery, low detection reliability, and reliance on secret keys for verification. Method: We propose the first publicly verifiable, zero-distortion, tamper-resistant watermarking scheme for LLMs. Instead of relying on cryptographic keys, our method embeds cryptographic signatures directly into token probability distributions via rejection sampling and integrates LDPC error-correcting codes to enhance robustness in low-entropy contexts. We provide formal proofs of unforgeability and strict zero textual distortion. Results: Experiments demonstrate >99% watermark detection accuracy, strong resilience against editing, translation, rewriting, and other adversarial attacks, and consistent performance across diverse prompts. All theoretical claims—including public verifiability, zero distortion, and robustness—are empirically validated. This work overcomes fundamental security and practicality bottlenecks hindering publicly verifiable watermarking in LLM applications.

📝 Abstract

We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.

Problem

Research questions and friction points this paper is trying to address.

Watermarking

Text Authentication

Error Correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transparent watermarking

Cryptography signature

Error correction

🔎 Similar Papers

Can Watermarked LLMs be Identified by Users via Crafted Prompts?