🤖 AI Summary
This study addresses the copyright and attribution challenges in AI music generation arising from ambiguous training data provenance. We propose the first data attribution method tailored to large-scale music generation models. Methodologically, we introduce inverse learning (unlearning) to the music generation domain, integrating it with diffusion model architectures and cross-sample similarity analysis; attribution consistency is optimized via hyperparameter grid search. Experiments on real-world music generation models demonstrate feasibility: our approach yields significantly higher attribution consistency and interpretability compared to conventional similarity-based baselines, while enabling verifiable generation provenance tracing. This work establishes a practical technical foundation and methodological framework for artistic contribution assessment, copyright liability allocation, and ethical governance in AI-assisted music creation.
📝 Abstract
This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed to the generation of a particular output from a specific model. This is crucial in the context of AI-generated music, where proper recognition and credit for original artists are generally overlooked. By enabling white-box attribution, our work supports a fairer system for acknowledging artistic contributions and addresses pressing concerns related to AI ethics and copyright. We apply unlearning-based attribution to a text-to-music diffusion model trained on a large-scale dataset and investigate its feasibility and behavior in this setting. To validate the method, we perform a grid search over different hyperparameter configurations and quantitatively evaluate the consistency of the unlearning approach. We then compare attribution patterns from unlearning with those from a similarity-based approach. Our findings suggest that unlearning-based approaches can be effectively adapted to music generative models, introducing large-scale TDA to this domain and paving the way for more ethical and accountable AI systems for music creation.