SoK: How Robust is Audio Watermarking in Generative AI models?

๐Ÿ“… 2025-03-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing audio watermarking schemes lack systematic evaluation against AI-induced distortions, particularly in generative AI contexts. Method: We construct a comprehensive attack framework encompassing 22 categories (109 variants) spanning signal-level, physical-level, and AI-generation-specific distortions; reproduce nine state-of-the-art watermarking methods; and propose eight novel, efficient attacks. We introduce the first taxonomy of audio watermarking techniques and identify 11 fundamental vulnerabilities. Results: We establish the first large-scale, multi-dimensional robustness benchmark tailored to generative AI threats, built upon LibriSpeech, VCTK, and MUSAN. Empirical evaluation reveals that no tested scheme withstands all attacks. To foster reproducible risk assessment, we publicly release our code, evaluation framework, and interactive visualization demoโ€”providing industry with an open, standardized infrastructure for robustness validation.

Technology Category

Application Category

๐Ÿ“ Abstract
Audio watermarking is increasingly used to verify the provenance of AI-generated content, enabling applications such as detecting AI-generated speech, protecting music IP, and defending against voice cloning. To be effective, audio watermarks must resist removal attacks that distort signals to evade detection. While many schemes claim robustness, these claims are typically tested in isolation and against a limited set of attacks. A systematic evaluation against diverse removal attacks is lacking, hindering practical deployment. In this paper, we investigate whether recent watermarking schemes that claim robustness can withstand a broad range of removal attacks. First, we introduce a taxonomy covering 22 audio watermarking schemes. Next, we summarize their underlying technologies and potential vulnerabilities. We then present a large-scale empirical study to assess their robustness. To support this, we build an evaluation framework encompassing 22 types of removal attacks (109 configurations) including signal-level, physical-level, and AI-induced distortions. We reproduce 9 watermarking schemes using open-source code, identify 8 new highly effective attacks, and highlight 11 key findings that expose the fundamental limitations of these methods across 3 public datasets. Our results reveal that none of the surveyed schemes can withstand all tested distortions. This evaluation offers a comprehensive view of how current watermarking methods perform under real-world threats. Our demo and code are available at https://sokaudiowm.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Evaluates robustness of audio watermarking against diverse removal attacks
Identifies vulnerabilities in current watermarking schemes for AI-generated content
Assesses effectiveness of watermarking under real-world signal and AI distortions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic evaluation of 22 watermarking schemes
Framework with 22 removal attack types
Empirical study across 3 public datasets
๐Ÿ”Ž Similar Papers
No similar papers found.
Yizhu Wen
Yizhu Wen
Univeristy of Hawaii at Manoa
A
Ashwin Innuganti
Michigan State University
A
Aaron Bien Ramos
University of Hawaii at Manoa
H
Hanqing Guo
University of Hawaii at Manoa
Qiben Yan
Qiben Yan
Computer Science and Engineering, Michigan State University
Security and PrivacyCyber-Physical SystemsAI AgentInternet-of-ThingsSmart Contract