Membership and Dataset Inference Attacks on Large Audio Generative Models

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-sample membership inference (MI) methods fail on large-scale, heterogeneous audio datasets, hindering copyright protection for generative audio models. Method: We propose the first verifiable, audio-domain-specific training data attribution framework—Dataset Inference (DI)—a novel set-level inference paradigm that aggregates multi-artist audio samples to determine whether a given dataset contributed to model training. Our approach jointly leverages gradient and output-statistical features from diffusion and autoregressive audio models, incorporates a multi-sample evidence aggregation mechanism, and integrates contrastive benchmark modeling with statistical significance testing. Contribution/Results: Evaluated on multiple open-source large audio models, DI achieves high inference accuracy (AUC > 0.92), substantially outperforming state-of-the-art MI methods. This work provides the first empirically validated, technically feasible solution for audio content copyright auditing and training-data accountability.

Technology Category

Application Category

📝 Abstract
Generative audio models, based on diffusion and autoregressive architectures, have advanced rapidly in both quality and expressiveness. This progress, however, raises pressing copyright concerns, as such models are often trained on vast corpora of artistic and commercial works. A central question is whether one can reliably verify if an artist's material was included in training, thereby providing a means for copyright holders to protect their content. In this work, we investigate the feasibility of such verification through membership inference attacks (MIA) on open-source generative audio models, which attempt to determine whether a specific audio sample was part of the training set. Our empirical results show that membership inference alone is of limited effectiveness at scale, as the per-sample membership signal is weak for models trained on large and diverse datasets. However, artists and media owners typically hold collections of works rather than isolated samples. Building on prior work in text and vision domains, in this work we focus on dataset inference (DI), which aggregates diverse membership evidence across multiple samples. We find that DI is successful in the audio domain, offering a more practical mechanism for assessing whether an artist's works contributed to model training. Our results suggest DI as a promising direction for copyright protection and dataset accountability in the era of large audio generative models.
Problem

Research questions and friction points this paper is trying to address.

Investigates membership inference attacks on audio generative models
Examines dataset inference for copyright protection of artists' works
Assesses feasibility of verifying training data inclusion for accountability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset inference aggregates membership evidence across multiple audio samples
Membership inference alone is limited for large diverse training datasets
Dataset inference offers practical copyright protection for audio generative models
🔎 Similar Papers
No similar papers found.