🤖 AI Summary
Existing loss-based membership inference attacks struggle to effectively distinguish between training and non-training samples in music diffusion models under low false positive rates (FPR), as their reliance on reconstruction error misaligns with human perceptual judgments. This work proposes LSA-Probe, the first method to leverage geometric stability on the generative manifold as a membership signal. By measuring the minimal time-normalized perturbation required during the reverse diffusion process to push a sample across a fixed perceptual degradation threshold, LSA-Probe exploits the higher degradation cost of training samples in latent space for inference. Evaluated across multiple music diffusion models, the approach significantly outperforms conventional attacks, achieving superior effectiveness at low FPR while better aligning with human perception—demonstrating strong potential for practical copyright auditing applications.
📝 Abstract
Membership inference attacks (MIAs) test whether a specific audio clip was used to train a model, making them a key tool for auditing generative music models for copyright compliance. However, loss-based signals (e.g., reconstruction error) are weakly aligned with human perception in practice, yielding poor separability at the low false-positive rates (FPRs) required for forensics. We propose the Latent Stability Adversarial Probe (LSA-Probe), a white-box method that measures a geometric property of the reverse diffusion: the minimal time-normalized perturbation budget needed to cross a fixed perceptual degradation threshold at an intermediate diffusion state. We show that training members, residing in more stable regions, exhibit a significantly higher degradation cost.