Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing inference-time watermarking methods fail in continuous modalities such as audio due to inconsistent discretization, and approaches relying on tokenizer fine-tuning forfeit the key advantage of being training-free. This work demonstrates for the first time that discrete representation learning itself inherently supports highly robust watermarking. The authors propose a novel, fine-tuning-free and gradient-free token replacement mechanism that leverages community detection to uncover lexical redundancy within audio’s discrete representations, enabling the construction of a reduced vocabulary for watermark embedding and detection. The method achieves state-of-the-art performance in multimedia token-level watermarking, exhibiting exceptional robustness against diverse audio perturbations—including compression, noise addition, and resampling—and improves detection accuracy by several orders of magnitude.

📝 Abstract

As policy catches up with the capabilities of generative AI, watermarking is central to content provenance efforts. Inference-time watermarks for autoregressive models are unfit for continuous modalities due to discretization inconsistencies. Existing methods overcome this by finetuning the modality tokenizers, nullifying the watermark's training-free advantage. In this work, motivated by the vocabulary redundancy of discretization, we propose an elegant solution for powerful and robust watermarking of synthetic audio. We theoretically analyze the impact of token errors on watermark detection, and effectively mitigate them using a reduced vocabulary obtained via community detection. Thorough experiments showcase that our gradient-free method can boost detectability by several orders of magnitude, while also achieving built-in robustness to audio modifications. Broadly, we discover a new state-of-the-art for token-level watermarks in multimedia, which simply arises from the nature of discrete representation learning.

Problem

Research questions and friction points this paper is trying to address.

watermarking

synthetic audio

discretization

robustness

tokenization

Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient-free watermarking

token redundancy

community detection

synthetic audio

discrete representation

🔎 Similar Papers

No similar papers found.

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

Authors to Follow