🤖 AI Summary
Music information retrieval (MIR) systems exhibit significant vulnerability to adversarial attacks, yet existing methods often incur perceptible audio distortion or lack transferability across attack settings.
Method: This paper proposes a novel adversarial attack framework leveraging generative audio inpainting. It first identifies salient, model-sensitive audio segments via importance analysis, then reconstructs them imperceptibly using a gradient- or output-guided audio inpainting model—enabling effective white-box and black-box attacks.
Contribution/Results: To our knowledge, this is the first work to integrate audio inpainting into adversarial example generation for MIR. The method achieves near-transparent perturbations—subjectively indistinguishable from clean audio—while substantially improving attack success rates. Extensive experiments demonstrate state-of-the-art performance across diverse MIR tasks, including music genre classification and automatic tagging. Our approach establishes a new paradigm for security evaluation and robustness enhancement of MIR systems.
📝 Abstract
Music adversarial attacks have garnered significant interest in the field of Music Information Retrieval (MIR). In this paper, we present Music Adversarial Inpainting Attack (MAIA), a novel adversarial attack framework that supports both white-box and black-box attack scenarios. MAIA begins with an importance analysis to identify critical audio segments, which are then targeted for modification. Utilizing generative inpainting models, these segments are reconstructed with guidance from the output of the attacked model, ensuring subtle and effective adversarial perturbations. We evaluate MAIA on multiple MIR tasks, demonstrating high attack success rates in both white-box and black-box settings while maintaining minimal perceptual distortion. Additionally, subjective listening tests confirm the high audio fidelity of the adversarial samples. Our findings highlight vulnerabilities in current MIR systems and emphasize the need for more robust and secure models.