🤖 AI Summary
Existing methods for detecting AI-generated music rely on exposure to synthetic samples during training, resulting in poor generalization to unseen generators and limited practical utility. This work proposes the first zero-shot detection framework that operates without any access to generated examples, instead learning solely from real music. By employing a frequency-domain-guided normalizing flow, the method probabilistically models the distribution of authentic musical features and identifies out-of-distribution AI-generated content based on likelihood discrepancies. This approach achieves significantly improved detection performance against previously unseen generative models, consistently outperforming conventional discriminative detectors on both the FakeMusicCaps and SONICS benchmarks.
📝 Abstract
Detecting AI-generated music is crucial for preserving artistic authenticity and preventing the misuse of generative music technologies. However, existing discriminative detectors typically rely on generated samples during training and often suffer from severe performance degradation when confronted with music produced by unseen generators, which limits their real-world applicability. To address this issue, we formulate a zero-shot setting for AI-generated music detection, where the detector is trained exclusively on real music without access to any generated samples. Under this setting, we propose MusicDET, a generator-agnostic detection framework based on frequency-guided normalizing flows that probabilistically models the distribution of real music features. By evaluating the likelihood of an input sample under the learned real-music distribution, MusicDET enables effective detection of out-of-distribution music signals. Experiments on the FakeMusicCaps and SONICS datasets show that MusicDET consistently outperforms conventional discriminative detectors, particularly when detecting music generated by previously unseen models.