🤖 AI Summary
This work addresses the lack of a unified theoretical framework in existing large language model watermarking methods, which has led to fragmented designs and hindered systematic comparison. The authors propose the first general framework based on constrained optimization, unifying mainstream watermarking schemes under a common theoretical foundation and revealing the intrinsic trade-offs among generation quality, diversity, and detectability. By incorporating metrics such as perplexity as quality constraints, the framework enables on-demand customization of watermarking algorithms. Experimental results demonstrate that watermarking schemes derived from this framework consistently achieve optimal detection performance under their respective constraints, thereby validating its generality and effectiveness.
📝 Abstract
LLM watermarks allow tracing AI-generated texts by inserting a detectable signal into their generated content. Recent works have proposed a wide range of watermarking algorithms, each with distinct designs, usually built using a bottom-up approach. Crucially, there is no general and principled formulation for LLM watermarking. In this work, we show that most existing and widely used watermarking schemes can in fact be derived from a principled constrained optimization problem. Our formulation unifies existing watermarking methods and explicitly reveals the constraints that each method optimizes. In particular, it highlights an understudied quality-diversity-power trade-off. At the same time, our framework also provides a principled approach for designing novel watermarking schemes tailored to specific requirements. For instance, it allows us to directly use perplexity as a proxy for quality, and derive new schemes that are optimal with respect to this constraint. Our experimental evaluation validates our framework: watermarking schemes derived from a given constraint consistently maximize detection power with respect to that constraint.