A Watermark for Black-Box Language Models

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of watermarking black-box large language models (LLMs) accessed solely via API. We propose the first provably secure, API-level watermarking scheme that embeds and detects watermarks using only standard text sampling—without requiring access to internal token probability distributions. Methodologically, we introduce an implicit distribution manipulation framework, a key-driven token biasing strategy, and a progressive hypothesis testing detection mechanism, achieving zero output distortion and supporting multi-key chained nesting. Evaluated on mainstream LLM APIs—including GPT-4, Claude, and Llama—the scheme achieves >99% detection accuracy and <0.1% false positive rate, with no degradation in text quality or diversity; in certain settings, it even outperforms white-box watermarking baselines. To our knowledge, this is the first watermarking approach for black-box LLMs that provides rigorous theoretical security guarantees while maintaining practicality and seamless deployability.

Technology Category

Application Category

📝 Abstract
Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require white-box access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. black-box access), boasts a distortion-free property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.
Problem

Research questions and friction points this paper is trying to address.

Detect LLM outputs without white-box access
Develop distortion-free black-box watermarking
Enable multi-key chaining for watermarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box watermarking for LLMs
Distortion-free watermarking scheme
Multiple secret keys support
🔎 Similar Papers
No similar papers found.