🤖 AI Summary
This work addresses the challenge of embedding high-capacity, strongly imperceptible multi-bit watermarks into text generated by large language models. The authors propose a watermarking framework based on block autoregressive modeling, which—drawing from information theory—precisely characterizes the fundamental capacity of multi-bit watermarking for the first time. By integrating Gelfand–Pinsker coding and channel synthesis theory, they derive an optimal embedding strategy. To enhance performance across text blocks, they formulate the problem as a constrained Markov decision process and develop an explicit polar code–based encoding algorithm that approaches the information-theoretic limit. Experimental results demonstrate that the method achieves an embedding rate of 0.375 bits per token in short texts with a bit error rate below 10%, while introducing negligible degradation in text perplexity and semantic fidelity.
📝 Abstract
We study the problem of multi-bit watermarking for large language models (LLMs). We introduce a block-autoregressive model inspired by multi-token prediction, in which the encoder has limited non-causal access to token distributions within each block. This formulation enables an information-theoretic characterization of multi-bit watermarking capacity, by which the knowledge of LLM cover statistics is leveraged to enable a multi-bit covert embedding. We study the information-theoretic limits of the model by combining Gelfand-Pinsker and channel synthesis coding techniques and obtain an exact characterization of the capacity. The embedding strategy is further optimized across blocks using a constrained Markov decision process (CMDP) and we develop an explicit algorithm based on polar codes following the information-theoretic principles. Our algorithm achieves a bit-error rate below 10 percent with a rate of 0.375 bits/token over short token lengths with negligible perplexity and distortion degradation.