More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles

πŸ“… 2026-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitation of existing watermarking methods, where excessively strong single-layer watermarks reduce token distribution entropy and thereby degrade the effectiveness of multi-layer ensembles. To overcome this, we propose a novel paradigm of β€œweak per-layer, strong overall” watermarking: by attenuating individual layer watermarks, we preserve both distributional entropy and the proportion of green-list tokens, enhancing overall detectability and robustness. Grounded in information-theoretic analysis, we derive theoretical bounds on watermark detectability, construct an integrated framework for weak single-layer watermarks, and elucidate the counterintuitive mechanism by which overly strong watermarks impair ensemble performance. Experimental results demonstrate that our approach effectively mitigates signal attenuation and significantly outperforms existing strong-watermark baselines in both detectability and robustness.

Technology Category

Application Category

πŸ“ Abstract
Watermarking has emerged as a crucial technique for detecting and attributing content generated by large language models. While recent advancements have utilized watermark ensembles to enhance robustness, prevailing methods typically prioritize maximizing the strength of the watermark at every individual layer. In this work, we identify a critical limitation in this"stronger-is-better"approach: strong watermarks significantly reduce the entropy of the token distribution, which paradoxically weakens the effectiveness of watermarking in subsequent layers. We theoretically and empirically show that detectability is bounded by entropy and that watermark ensembles induce a monotonic decrease in both entropy and the expected green-list ratio across layers. To address this inherent trade-off, we propose a general framework that utilizes weaker single-layer watermarks to preserve the entropy required for effective multi-layer ensembling. Empirical evaluations demonstrate that this counter-intuitive strategy mitigates signal decay and consistently outperforms strong baselines in both detectability and robustness.
Problem

Research questions and friction points this paper is trying to address.

watermarking
entropy
large language models
watermark ensembles
token distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

watermark ensembles
entropy preservation
weak watermarking
green-list ratio
LLM watermarking
πŸ”Ž Similar Papers
No similar papers found.
R
Ruibo Chen
University of Maryland, College Park
Y
Yihan Wu
University of Maryland, College Park
X
Xuehao Cui
University of Maryland, College Park
J
Jingqi Zhang
National University of Singapore
Heng Huang
Heng Huang
Brendan Iribe Endowed Professor in Computer Science, University Maryland College Park
Machine LearningAIBiomedical Data ScienceComputer Vision