Multi-bit Audio Watermarking

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of embedding imperceptible, multi-bit watermarks into 44.1 kHz stereo audio without training or dataset dependence. The proposed method operates in the latent space of a pre-trained audio variational autoencoder (VAE), jointly optimizing message fidelity and perceptual loss via gradient-based perturbation injection to embed watermarks. During extraction, a frozen, pre-trained CLAP model enables zero-shot watermark detection. To our knowledge, this is the first fully training-free, end-to-end differentiable audio watermarking framework. Evaluated on MUSDB18-HQ, it achieves state-of-the-art robustness against common signal-processing attacks—including filtering, additive noise, MP3 compression, and resampling—yielding the lowest average bit error rate (BER). Subjective listening tests confirm perceptual transparency. The approach significantly outperforms existing supervised and generative audio watermarking methods in both robustness and imperceptibility.

Technology Category

Application Category

📝 Abstract
We present Timbru, a post-hoc audio watermarking model that achieves state-of-the-art robustness and imperceptibility trade-offs without training an embedder-detector model. Given any 44.1 kHz stereo music snippet, our method performs per-audio gradient optimization to add imperceptible perturbations in the latent space of a pretrained audio VAE, guided by a combined message and perceptual loss. The watermark can then be extracted using a pretrained CLAP model. We evaluate 16-bit watermarking on MUSDB18-HQ against AudioSeal, WavMark, and SilentCipher across common filtering, noise, compression, resampling, cropping, and regeneration attacks. Our approach attains the best average bit error rates, while preserving perceptual quality, demonstrating an efficient, dataset-free path to imperceptible audio watermarking.
Problem

Research questions and friction points this paper is trying to address.

Achieving robust imperceptible audio watermarking without training embedder-detector models
Optimizing perturbations in pretrained VAE latent space for watermark embedding
Extracting watermarks using pretrained CLAP model under various audio attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Per-audio gradient optimization in latent space
Uses pretrained VAE and CLAP models
Achieves robustness without embedder-detector training
🔎 Similar Papers
No similar papers found.