🤖 AI Summary
Existing audio watermarking methods support only model-level provenance tracing, failing to identify the source training datasets—thereby limiting copyright protection and accountability attribution. This paper proposes DualMark, the first audio watermarking framework enabling joint model- and dataset-level provenance tracing. Its core innovation is a dual-watermark embedding mechanism operating on Mel-spectrograms, coupled with a Watermark Consistency Loss to ensure robust extraction. Additionally, we introduce DAB (Dual Attribution Benchmark), the first evaluation benchmark specifically designed for joint attribution. Experiments demonstrate that DualMark achieves 97.01% F1-score for model attribution and 91.51% AUC for dataset attribution. It maintains high robustness against common adversarial perturbations—including pruning, compression, and additive noise—significantly outperforming existing baselines.
📝 Abstract
Existing watermarking methods for audio generative models only enable model-level attribution, allowing the identification of the originating generation model, but are unable to trace the underlying training dataset. This significant limitation raises critical provenance questions, particularly in scenarios involving copyright and accountability concerns. To bridge this fundamental gap, we introduce DualMark, the first dual-provenance watermarking framework capable of simultaneously encoding two distinct attribution signatures, i.e., model identity and dataset origin, into audio generative models during training. Specifically, we propose a novel Dual Watermark Embedding (DWE) module to seamlessly embed dual watermarks into Mel-spectrogram representations, accompanied by a carefully designed Watermark Consistency Loss (WCL), which ensures reliable extraction of both watermarks from generated audio signals. Moreover, we establish the Dual Attribution Benchmark (DAB), the first robustness evaluation benchmark specifically tailored for joint model-data attribution. Extensive experiments validate that DualMark achieves outstanding attribution accuracy (97.01% F1-score for model attribution, and 91.51% AUC for dataset attribution), while maintaining exceptional robustness against aggressive pruning, lossy compression, additive noise, and sampling attacks, conditions that severely compromise prior methods. Our work thus provides a foundational step toward fully accountable audio generative models, significantly enhancing copyright protection and responsibility tracing capabilities.