APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the limitation of existing popularity prediction methods for AI-generated music, which often neglect aesthetic quality and thus fail to accurately capture human preferences. To bridge this gap, the authors propose the first large-scale multi-task learning framework tailored for AI-generated music, built upon frozen self-supervised audio embeddings (MERT) to jointly predict play counts, likes, and five perceptual aesthetic dimensions. Trained on a dataset of 211,000 AI-generated songs (approximately 10,000 hours), the model demonstrates significantly improved performance in predicting human preferences on the Music Arena benchmark, which includes tracks from 11 previously unseen generative systems. These results underscore the critical role of explicit aesthetic modeling in enhancing cross-system generalization for preference prediction.

📝 Abstract

Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.

Problem

Research questions and friction points this paper is trying to address.

music popularity prediction

AI-generated music

aesthetic quality

multi-task learning

out-of-distribution generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task learning

aesthetic quality

AI-generated music