🤖 AI Summary
Existing multi-bit text watermarking methods for large language models prioritize capacity over reliability, often conflating decoding and detection mechanisms, which leads to high false positive rates and low detection sensitivity. This work proposes BREW, a novel framework that introduces a “designated verification” paradigm. By decoupling decoding from detection through a two-stage mechanism—comprising block-wise codeword embedding, independent block voting, and sliding-window verification—BREW fundamentally mitigates structural false positives. The approach is model-agnostic and robust to local edits, achieving a true positive rate of 0.965 and a false positive rate of merely 0.02 under a 10% synonym substitution attack, substantially outperforming current state-of-the-art methods.
📝 Abstract
Recent multi-bit watermarking methods for large language models (LLMs) prioritize capacity over reliability, often conflating decoding with detection. Our analysis reveals that existing ECC-based extractors suffer from catastrophic false positive rates (FPR), and applying rejection thresholds merely collapses detection sensitivity (TPR) to random guessing. To resolve this structural limitation, we propose \textbf{BREW} (Block-wise Reliable Embedding for Watermarking), a framework shifting the paradigm to \emph{designated verification}. BREW employs a two-stage mechanism: (i) \textbf{blind message estimation} via independent block voting, followed by (ii) \textbf{window-shifting verification} that rigorously validates the payload against local edits. Experiments demonstrate that BREW achieves a TPR of 0.965 with an FPR of 0.02 under 10\% synonym substitution, demonstrating that the high-FPR issue is not an inherent trade-off of multi-bit watermarking, but a solvable structural flaw of prior decoding-centric designs. Our framework is model-agnostic and theoretically grounded, providing a scalable solution for reliable forensic deployment.