🤖 AI Summary
Existing watermarking methods for large language models struggle to simultaneously achieve multi-bit embedding, high text quality, and robust detection. This work proposes a distortion-free multi-bit watermarking mechanism that embeds watermarks via measure-preserving random mirroring without altering the token distribution. To enhance robustness against insertion and deletion attacks, a context-aware scheduler is designed to evenly allocate watermark bits across the generated text. The proposed method uniquely unifies high-quality generation with strong detectability, embedding 54 bits of watermark within 300 tokens while preserving text quality comparable to original generation. It improves detection accuracy by 8–12% and achieves up to an 11% gain in identification rate at a 1% false positive rate. Furthermore, the study provides a theoretical analysis framework based on the equal error rate.
📝 Abstract
As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but existing methods either provide only binary signals or distort the sampling distribution, degrading text quality; distortion-free approaches, in turn, often suffer from weak detectability or robustness. We propose MirrorMark, a multi-bit and distortion-free watermark for LLMs. By mirroring sampling randomness in a measure-preserving manner, MirrorMark embeds multi-bit messages without altering the token probability distribution, preserving text quality by design. To improve robustness, we introduce a context-based scheduler that balances token assignments across message positions while remaining resilient to insertions and deletions. We further provide a theoretical analysis of the equal error rate to interpret empirical performance. Experiments show that MirrorMark matches the text quality of non-watermarked generation while achieving substantially stronger detectability: with 54 bits embedded in 300 tokens, it improves bit accuracy by 8-12% and correctly identifies up to 11% more watermarked texts at 1% false positive rate.