Distributional Information Embedding: A Framework for Multi-bit Watermarking

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses multi-bit watermarking for text generated by large language models (LLMs). Existing parasitic watermarking methods degrade text quality and lack rigorous information-theoretic foundations. Method: We propose a novel “distributional information embedding” paradigm—embedding detectable signals by actively shaping token probability distributions rather than post-hoc modifications—and formalize the problem within an information-theoretic framework that characterizes the fundamental trade-off among text fidelity (distortion), watermark detectability, and embedding rate. Contribution/Results: We prove that the maximum achievable watermark rate equals the entropy of the LLM’s output distribution. We design asymptotically optimal and finite-length provably superior watermarking schemes grounded in rate-distortion theory. Leveraging statistical hypothesis testing, we derive the optimal detection strategy that maximizes detection probability under constraints on false alarm rate and distortion. Furthermore, we establish that the achievable watermark rate increases monotonically with tolerable distortion.

Technology Category

Application Category

📝 Abstract
This paper introduces a novel problem, distributional information embedding, motivated by the practical demands of multi-bit watermarking for large language models (LLMs). Unlike traditional information embedding, which embeds information into a pre-existing host signal, LLM watermarking actively controls the text generation process--adjusting the token distribution--to embed a detectable signal. We develop an information-theoretic framework to analyze this distributional information embedding problem, characterizing the fundamental trade-offs among three critical performance metrics: text quality, detectability, and information rate. In the asymptotic regime, we demonstrate that the maximum achievable rate with vanishing error corresponds to the entropy of the LLM's output distribution and increases with higher allowable distortion. We also characterize the optimal watermarking scheme to achieve this rate. Extending the analysis to the finite-token case, we identify schemes that maximize detection probability while adhering to constraints on false alarm and distortion.
Problem

Research questions and friction points this paper is trying to address.

Watermarking
Large Language Models
Text Quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed Information Embedding
Multibit Watermarking
Optimized Information Transmission
🔎 Similar Papers
No similar papers found.