Universal Music Representations? Evaluating Foundation Models on World Music Corpora

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the cross-cultural generalization capabilities of five foundational audio models on six diverse musical traditions—including Western pop, Greek, Turkish, and Indian classical music—to assess their potential for universal music representation. We introduce the first cross-cultural music benchmarking framework, integrating representation probing with targeted fine-tuning (1–2 layers), revealing that large models implicitly encode rich cross-cultural musical knowledge—often yielding superior performance via probing alone compared to supervised fine-tuning. To address data scarcity in traditional music contexts, we propose a multi-label few-shot learning paradigm. Experimental results show that scaling model size improves non-Western music understanding, yet performance degrades significantly for musically distant traditions. Our approach achieves state-of-the-art results across five benchmark datasets. We publicly release the evaluation protocol and a fully reproducible benchmark.

Technology Category

Application Category

📝 Abstract
Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize across diverse musical traditions. This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora spanning Western popular, Greek, Turkish, and Indian classical traditions. We employ three complementary methodologies to investigate these models'cross-cultural capabilities: probing to assess inherent representations, targeted supervised fine-tuning of 1-2 layers, and multi-label few-shot learning for low-resource scenarios. Our analysis shows varying cross-cultural generalization, with larger models typically outperforming on non-Western music, though results decline for culturally distant traditions. Notably, our approaches achieve state-of-the-art performance on five out of six evaluated datasets, demonstrating the effectiveness of foundation models for world music understanding. We also find that our targeted fine-tuning approach does not consistently outperform probing across all settings, suggesting foundation models already encode substantial musical knowledge. Our evaluation framework and benchmarking results contribute to understanding how far current models are from achieving universal music representations while establishing metrics for future progress.
Problem

Research questions and friction points this paper is trying to address.

Evaluating audio foundation models on diverse world music traditions
Assessing cross-cultural generalization in music representation models
Developing methodologies for low-resource world music understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating audio foundation models on diverse music
Using probing, fine-tuning, and few-shot learning
Achieving state-of-the-art performance on world music
🔎 Similar Papers
No similar papers found.