🤖 AI Summary
Existing emotional corpora are predominantly monolingual and single-label, limiting their ability to model mixed emotions and code-switching in authentic contexts, thereby compromising ecological validity and cross-lingual generalizability. To address this, we introduce the first multilingual mixed-emotion speech corpus—covering English, Mandarin, and Cantonese—with fine-grained, multi-label emotion annotations and label distribution learning to capture emotional continuity. The corpus is built from spontaneous online speech recordings and rigorously annotated. We conduct speaker-independent benchmark experiments using self-supervised models (e.g., HuBERT-large-EN), demonstrating robust performance across gender, age, and personality subgroups. Both the corpus and baseline code are publicly released. This work significantly advances adaptability and empathic modeling capabilities in affective computing systems.
📝 Abstract
This study introduces EM2LDL, a novel multilingual speech corpus designed to advance mixed emotion recognition through label distribution learning. Addressing the limitations of predominantly monolingual and single-label emotion corpora extcolor{black}{that restrict linguistic diversity, are unable to model mixed emotions, and lack ecological validity}, EM2LDL comprises expressive utterances in English, Mandarin, and Cantonese, capturing the intra-utterance code-switching prevalent in multilingual regions like Hong Kong and Macao. The corpus integrates spontaneous emotional expressions from online platforms, annotated with fine-grained emotion distributions across 32 categories. Experimental baselines using self-supervised learning models demonstrate robust performance in speaker-independent gender-, age-, and personality-based evaluations, with HuBERT-large-EN achieving optimal results. By incorporating linguistic diversity and ecological validity, EM2LDL enables the exploration of complex emotional dynamics in multilingual settings. This work provides a versatile testbed for developing adaptive, empathetic systems for applications in affective computing, including mental health monitoring and cross-cultural communication. The dataset, annotations, and baseline codes are publicly available at https://github.com/xingfengli/EM2LDL.