Multi-bit Audio Watermarking

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the problem of embedding imperceptible, multi-bit watermarks into 44.1 kHz stereo audio without training or dataset dependence. The proposed method operates in the latent space of a pre-trained audio variational autoencoder (VAE), jointly optimizing message fidelity and perceptual loss via gradient-based perturbation injection to embed watermarks. During extraction, a frozen, pre-trained CLAP model enables zero-shot watermark detection. To our knowledge, this is the first fully training-free, end-to-end differentiable audio watermarking framework. Evaluated on MUSDB18-HQ, it achieves state-of-the-art robustness against common signal-processing attacks—including filtering, additive noise, MP3 compression, and resampling—yielding the lowest average bit error rate (BER). Subjective listening tests confirm perceptual transparency. The approach significantly outperforms existing supervised and generative audio watermarking methods in both robustness and imperceptibility.

Technology Category

Application Category

📝 Abstract

We present Timbru, a post-hoc audio watermarking model that achieves state-of-the-art robustness and imperceptibility trade-offs without training an embedder-detector model. Given any 44.1 kHz stereo music snippet, our method performs per-audio gradient optimization to add imperceptible perturbations in the latent space of a pretrained audio VAE, guided by a combined message and perceptual loss. The watermark can then be extracted using a pretrained CLAP model. We evaluate 16-bit watermarking on MUSDB18-HQ against AudioSeal, WavMark, and SilentCipher across common filtering, noise, compression, resampling, cropping, and regeneration attacks. Our approach attains the best average bit error rates, while preserving perceptual quality, demonstrating an efficient, dataset-free path to imperceptible audio watermarking.

Problem

Research questions and friction points this paper is trying to address.

Achieving robust imperceptible audio watermarking without training embedder-detector models

Optimizing perturbations in pretrained VAE latent space for watermark embedding

Extracting watermarks using pretrained CLAP model under various audio attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Per-audio gradient optimization in latent space

Uses pretrained VAE and CLAP models

Achieves robustness without embedder-detector training

🔎 Similar Papers

No similar papers found.

Cohere

Toronto, San Francisco, New York City, London, Paris, Montreal, Seoul, Germany, PST, EST

Sr Staff R&D Engineer

Disney

The hiring range for this position in Nicasio, CA is $206,400 to $276,700 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Nicasio, CA, USA

Authors to Follow