GlobalMood: A cross-cultural benchmark for music emotion recognition

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing music emotion recognition datasets predominantly rely on English affective terms and Western songs, limiting cross-cultural generalizability. To address this, we introduce GlobalMood—the first cross-cultural benchmark dataset comprising 1,180 songs from 59 countries. Emotion lexicons were derived bottom-up from 2,519 participants across five geographic regions, yielding 989,000 human annotations. Our participant-driven emotion term generation paradigm empirically reveals cross-cultural semantic shifts in affective vocabulary for the first time, while confirming that the valence–arousal two-dimensional structure exhibits both cross-cultural universality and culture-specific nuances. Leveraging multilingual alignment, cross-cultural crowdsourcing, and multimodal model fine-tuning, we significantly improve state-of-the-art models’ alignment with human perception in non-English settings (+12.7% Kendall τ). GlobalMood provides critical empirical support for theories of emotion universality and is publicly released.

Technology Category

Application Category

📝 Abstract
Human annotations of mood in music are essential for music generation and recommender systems. However, existing datasets predominantly focus on Western songs with mood terms derived from English, which may limit generalizability across diverse linguistic and cultural backgrounds. To address this, we introduce `GlobalMood', a novel cross-cultural benchmark dataset comprising 1,180 songs sampled from 59 countries, with large-scale annotations collected from 2,519 individuals across five culturally and linguistically distinct locations: U.S., France, Mexico, S. Korea, and Egypt. Rather than imposing predefined mood categories, we implement a bottom-up, participant-driven approach to organically elicit culturally specific music-related mood terms. We then recruit another pool of human participants to collect 988,925 ratings for these culture-specific descriptors. Our analysis confirms the presence of a valence-arousal structure shared across cultures, yet also reveals significant divergences in how certain mood terms, despite being dictionary equivalents, are perceived cross-culturally. State-of-the-art multimodal models benefit substantially from fine-tuning on our cross-culturally balanced dataset, as evidenced by improved alignment with human evaluations - particularly in non-English contexts. More broadly, our findings inform the ongoing debate on the universality versus cultural specificity of emotional descriptors, and our methodology can contribute to other multimodal and cross-lingual research.
Problem

Research questions and friction points this paper is trying to address.

Limited generalizability of Western-centric music mood datasets across cultures
Lack of culturally diverse mood annotations for music emotion recognition
Need for cross-cultural validation of emotional descriptor universality in music
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-cultural dataset with 1,180 songs from 59 countries
Bottom-up approach for culturally specific mood terms
Multimodal models fine-tuned for cross-cultural emotion recognition
🔎 Similar Papers
No similar papers found.