WorldCup Sampling for Multi-bit LLM Watermarking

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing multi-bit watermarking methods for large language models, which rely on seed-based guidance and consequently suffer from indirect information embedding, limited capacity, and suboptimal decoding performance. The authors model text generation as a communication channel and propose a hierarchical competitive sampling mechanism guided by complementary signals to directly embed information bits during token selection. To preserve textual quality, they integrate entropy-aware modulation, and further enhance robustness through a confidence-aware decoding strategy. The resulting framework significantly outperforms current approaches across multiple dimensions—watermark capacity, detectability, robustness, text quality, and decoding efficiency—enabling highly effective and reliable multi-bit watermark embedding and extraction.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) generate increasingly human-like text, watermarking offers a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking enables richer provenance encoding, existing methods largely extend zero-bit schemes through seed-driven steering, leading to indirect information flow, limited effective capacity, and suboptimal decoding. In this paper, we propose WorldCup, a multi-bit watermarking framework for LLMs that treats sampling as a natural communication channel and embeds message bits directly into token selection via a hierarchical competition mechanism guided by complementary signals. Moreover, WorldCup further adopts entropy-aware modulation to preserve generation quality and supports robust message recovery through confidence-aware decoding. Comprehensive experiments show that WorldCup achieves a strong balance across capacity, detectability, robustness, text quality, and decoding efficiency, consistently outperforming prior baselines and laying a solid foundation for future LLM watermarking studies.
Problem

Research questions and friction points this paper is trying to address.

LLM watermarking
multi-bit watermarking
token sampling
information embedding
generation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-bit watermarking
hierarchical competition
entropy-aware modulation
confidence-aware decoding
LLM watermarking
🔎 Similar Papers
No similar papers found.
Y
Yidan Wang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Y
Yubing Ren
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Yanan Cao
Yanan Cao
Institute of Information Engineering, Chinese Academy of Sciences
L
Li Guo
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China