HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The proliferation of AI-generated audio has intensified risks of misinformation and voice spoofing, making audio watermarking a critical defense—yet its robustness against removal remains inadequately and objectively evaluated. Existing watermark removal methods either rely on unrealistic assumptions (e.g., access to watermark content or secret keys) or incur prohibitive computational overhead, compromising evaluation validity. To address this, we propose an adaptive, cross-domain watermark removal framework that requires only black-box generative capability—no watermark content, key, or prior knowledge. Our approach employs a dual-path convolutional autoencoder jointly modeling time- and frequency-domain features, augmented with adversarial training for improved generalization. It exhibits strong cross-distribution transferability and substantially reduced computational cost. Evaluated on state-of-the-art watermarking schemes—including AudioSeal, WavMark, and SilentCipher—our method achieves near-real-time, SOTA removal performance, establishing a more credible and efficient benchmark for watermark robustness assessment.

Technology Category

Application Category

📝 Abstract
The availability of high-quality, AI-generated audio raises security challenges such as misinformation campaigns and voice-cloning fraud. A key defense against the misuse of AI-generated audio is by watermarking it, so that it can be easily distinguished from genuine audio. As those seeking to misuse AI-generated audio may thus seek to remove audio watermarks, studying effective watermark removal techniques is critical to being able to objectively evaluate the robustness of audio watermarks against removal. Previous watermark removal schemes either assume impractical knowledge of the watermarks they are designed to remove or are computationally expensive, potentially generating a false sense of confidence in current watermark schemes. We introduce HarmonicAttack, an efficient audio watermark removal method that only requires the basic ability to generate the watermarks from the targeted scheme and nothing else. With this, we are able to train a general watermark removal model that is able to remove the watermarks generated by the targeted scheme from any watermarked audio sample. HarmonicAttack employs a dual-path convolutional autoencoder that operates in both temporal and frequency domains, along with GAN-style training, to separate the watermark from the original audio. When evaluated against state-of-the-art watermark schemes AudioSeal, WavMark, and Silentcipher, HarmonicAttack demonstrates greater watermark removal ability than previous watermark removal methods with near real-time performance. Moreover, while HarmonicAttack requires training, we find that it is able to transfer to out-of-distribution samples with minimal degradation in performance.
Problem

Research questions and friction points this paper is trying to address.

Developing efficient audio watermark removal to evaluate watermark robustness
Overcoming limitations of impractical knowledge requirements in removal schemes
Creating adaptive cross-domain removal that transfers to unseen samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses dual-path convolutional autoencoder in domains
Employs GAN-style training to separate watermark
Requires only watermark generation ability for removal
🔎 Similar Papers
No similar papers found.
K
Kexin Li
University of Toronto
X
Xiao Hu
University of Toronto
I
Ilya Grishchenko
University of Toronto
David Lie
David Lie
University of Toronto
Computer ScienceComputer SecurityOperating SystemsVirtualizationComputer Architecture