MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge that existing text watermarking methods struggle to simultaneously preserve generation quality and ensure sufficient watermark strength when embedding long messages, thereby limiting traceability. To overcome this, the authors propose a distortion-free multi-bit watermarking framework that enhances watermark signals through multi-channel coloring reweighting and multi-layer sequence reweighting mechanisms, all while strictly maintaining the unbiased token distribution of the language model’s output. Additionally, an evidence accumulation detector is designed to enable high-precision decoding. The method supports high-fidelity embedding and recovery of long messages, achieving near-perfect decoding accuracy for short messages and improving long-message detection performance by nearly 30% over the next-best approach, all without compromising text quality.

Technology Category

Application Category

📝 Abstract

Large language models now produce text indistinguishable from human writing, which increases the need for reliable provenance tracing. Multi-bit watermarking can embed identifiers into generated text, but existing methods struggle to keep both text quality and watermark strength while carrying long messages. We propose MC$^2$Mark, a distortion-free multi-bit watermarking framework designed for reliable embedding and decoding of long messages. Our key technical idea is Multi-Channel Colored Reweighting, which encodes bits through structured token reweighting while keeping the token distribution unbiased, together with Multi-Layer Sequential Reweighting to strengthen the watermark signal and an evidence-accumulation detector for message recovery. Experiments show that MC$^2$Mark improves detectability and robustness over prior multi-bit watermarking methods while preserving generation quality, achieving near-perfect accuracy for short messages and exceeding the second-best method by nearly 30% for long messages.

Problem

Research questions and friction points this paper is trying to address.

multi-bit watermarking

long messages

text quality

watermark strength

provenance tracing

Innovation

Methods, ideas, or system contributions that make the work stand out.

distortion-free watermarking

multi-bit watermarking

Multi-Channel Colored Reweighting

evidence-accumulation detection

long-message embedding

🔎 Similar Papers

No similar papers found.

Authors to Follow