Latent-Mark: An Audio Watermark Robust to Neural Resynthesis

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first zero-bit audio watermarking framework designed to be robust against semantic compression, such as that induced by neural audio codecs, which often removes subtle waveform perturbations used by conventional methods. By introducing directional perturbations in a shared, codec-invariant latent space of neural audio codecs, the watermark manifests as a detectable shift in the direction of latent representations while remaining imperceptible to human listeners. The approach integrates cross-codec joint optimization with audio manifold constraints to achieve zero-shot transfer robustness against unseen neural codecs. Experimental results demonstrate state-of-the-art detection performance under diverse neural resynthesis attacks and traditional digital signal processing operations, all while preserving high perceptual audio quality.

Technology Category

Application Category

📝 Abstract
While existing audio watermarking techniques have achieved strong robustness against traditional digital signal processing (DSP) attacks, they remain vulnerable to neural resynthesis. This occurs because modern neural audio codecs act as semantic filters and discard the imperceptible waveform variations used in prior watermarking methods. To address this limitation, we propose Latent-Mark, the first zero-bit audio watermarking framework designed to survive semantic compression. Our key insight is that robustness to the encode-decode process requires embedding the watermark within the codec's invariant latent space. We achieve this by optimizing the audio waveform to induce a detectable directional shift in its encoded latent representation, while constraining perturbations to align with the natural audio manifold to ensure imperceptibility. To prevent overfitting to a single codec's quantization rules, we introduce Cross-Codec Optimization, jointly optimizing the waveform across multiple surrogate codecs to target shared latent invariants. Extensive evaluations demonstrate robust zero-shot transferability to unseen neural codecs, achieving state-of-the-art resilience against traditional DSP attacks while preserving perceptual imperceptibility. Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
Problem

Research questions and friction points this paper is trying to address.

audio watermarking
neural resynthesis
semantic compression
neural audio codecs
robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

audio watermarking
neural resynthesis
latent space
zero-bit watermarking
cross-codec optimization
🔎 Similar Papers
No similar papers found.
Y
Yen-Shan Chen
National Taiwan University, Taiwan, CyCraft AI Lab, Taiwan
S
Shih-Yu Lai
National Taiwan University, Taiwan, RIKEN Center for Computational Science (RIKEN-CCS), Japan, MoonShine Animation Studio, Taiwan
Y
Ying-Jung Tsou
National Taiwan University, Taiwan
Yi-Cheng Lin
Yi-Cheng Lin
National Taiwan University
Speech ProcessingMachine LearningFairness
Bing-Yu Chen
Bing-Yu Chen
National Taiwan University
Computer GraphicsHuman-Computer Interaction
Y
Yun-Nung Chen
National Taiwan University, Taiwan
H
Hung-Yi Lee
National Taiwan University, Taiwan
Shang-Tse Chen
Shang-Tse Chen
Associate Professor, National Taiwan University
Machine LearningArtificial IntelligenceSecurityAlgorithmic Game TheoryData Science