Watermarking Low-entropy Generation for Large Language Models: An Unbiased and Low-risk Method

📅 2024-05-23

📈 Citations: 1

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing watermarking methods for large language models (LLMs) suffer from output distribution shift, degraded text quality, or reliance on white-box access—hindering practical deployment for content provenance. Method: We propose Sampling One Then Accepting (STA-1), a black-box watermarking mechanism based on the sampling–acceptance framework that embeds watermarks without biasing the token distribution, requiring no prompt modification, model parameter tuning, or architectural changes. Contribution/Results: STA-1 is the first method to simultaneously achieve statistical unbiasedness, minimal quality degradation, prompt- and white-box independence, robustness against pruning and synonym substitution attacks, and high detection efficiency (>99% accuracy). Detection employs statistically rigorous hypothesis testing with strict significance guarantees, sub-10-ms latency, and empirically validated imperceptibility and robustness across both high- and low-entropy text.

Technology Category

Application Category

📝 Abstract

Recent advancements in large language models (LLMs) have highlighted the risk of misusing them, raising the need for accurate detection of LLM-generated content. In response, a viable solution is to inject imperceptible identifiers into LLMs, known as watermarks. Our research extends the existing watermarking methods by proposing the novel Sampling One Then Accepting (STA-1) method. STA-1 is an unbiased watermark that preserves the original token distribution in expectation and has a lower risk of producing unsatisfactory outputs in low-entropy scenarios compared to existing unbiased watermarks. In watermark detection, STA-1 does not require prompts or a white-box LLM, provides statistical guarantees, demonstrates high efficiency in detection time, and remains robust against various watermarking attacks. Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves the above properties simultaneously, making it a desirable solution for watermarking LLMs. Implementation codes for this study are available online.

Problem

Research questions and friction points this paper is trying to address.

Develops unbiased watermarking for LLMs

Enhances detection without prompts or white-box access

Ensures robustness against watermarking attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unbiased watermarking method

Low-risk output generation

Efficient, robust detection

🔎 Similar Papers

No similar papers found.