Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This study systematically evaluates the effectiveness of nine state-of-the-art pretrained audio models—including MusicFM, MERT, and Jukebox—in music recommendation tasks, addressing a critical gap at the intersection of music information retrieval (MIR) and recommender systems. Through end-to-end experiments combining five recommendation paradigms (KNN, shallow neural networks, contrastive multimodal projection, hybrid models, and BERT4Rec) under both popular and cold-start scenarios, the work provides the first comprehensive analysis of how pretrained audio representations vary in utility for recommendation and how their informational value differs from that in traditional MIR tasks. The findings demonstrate that the choice of audio representation significantly impacts recommendation performance, offering empirical grounding and strategic guidance for leveraging audio semantics to enhance recommender systems.

Technology Category

Application Category

📝 Abstract
Over the years, Music Information Retrieval (MIR) research community has released various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models for a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network training. Our research addresses this gap and evaluates the performance of nine pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ and MuQ-MuLan) in the context of MRS. We assess them using five recommendation approaches: K-Nearest Neighbours (KNN), Shallow Neural Network, Contrastive Multi-Modal projection, a Hybrid model, and BERT4Rec both for the hot and cold-start scenarios. Our findings suggest that pretrained audio representations exhibit significant performance disparity between traditional MIR tasks and both hot and cold music recommendations, indicating that valuable aspects of musical information captured by backend models may differ depending on the task. This study establishes a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.
Problem

Research questions and friction points this paper is trying to address.

pretrained audio representations
music recommender systems
transfer learning
cold-start recommendation
music information retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

pretrained audio representations
music recommender systems
transfer learning
cold-start recommendation
music information retrieval
🔎 Similar Papers