EmoTale: An Enacted Speech-emotion Dataset in Danish

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

To address the scarcity of modern emotional speech datasets for low-resource languages such as Danish, this work introduces EmoTale—the first bilingual (Danish/English) acted emotional speech corpus—filling a longstanding gap in Danish emotional speech resources since the 1997 Danish Emotional Speech (DES) corpus. EmoTale encompasses six basic emotions, recorded by professional actors and annotated with fine-grained metadata, enabling cross-lingual affective modeling and comparative analysis. We evaluate emotion recognition performance using self-supervised speech representations (SSLME embeddings) and openSMILE acoustic features under leave-one-speaker-out cross-validation; the Danish subset achieves an unweighted average recall (UAR) of 64.1%, matching DES’s performance and confirming the corpus’s validity and utility. This contribution provides both a high-quality benchmark dataset and a methodological framework for emotion recognition in low-resource languages.

Technology Category

Application Category

📝 Abstract

While multiple emotional speech corpora exist for commonly spoken languages, there is a lack of functional datasets for smaller (spoken) languages, such as Danish. To our knowledge, Danish Emotional Speech (DES), published in 1997, is the only other database of Danish emotional speech. We present EmoTale; a corpus comprising Danish and English speech recordings with their associated enacted emotion annotations. We demonstrate the validity of the dataset by investigating and presenting its predictive power using speech emotion recognition (SER) models. We develop SER models for EmoTale and the reference datasets using self-supervised speech model (SSLM) embeddings and the openSMILE feature extractor. We find the embeddings superior to the hand-crafted features. The best model achieves an unweighted average recall (UAR) of 64.1% on the EmoTale corpus using leave-one-speaker-out cross-validation, comparable to the performance on DES.

Problem

Research questions and friction points this paper is trying to address.

Lack of emotional speech datasets for Danish language

Need for validated emotional speech recognition models

Comparison of self-supervised versus hand-crafted feature performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed Danish-English emotional speech dataset

Used self-supervised speech model embeddings

Achieved 64.1% UAR with cross-validation

🔎 Similar Papers

No similar papers found.