EmoTale: An Enacted Speech-emotion Dataset in Danish

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of modern emotional speech datasets for low-resource languages such as Danish, this work introduces EmoTale—the first bilingual (Danish/English) acted emotional speech corpus—filling a longstanding gap in Danish emotional speech resources since the 1997 Danish Emotional Speech (DES) corpus. EmoTale encompasses six basic emotions, recorded by professional actors and annotated with fine-grained metadata, enabling cross-lingual affective modeling and comparative analysis. We evaluate emotion recognition performance using self-supervised speech representations (SSLME embeddings) and openSMILE acoustic features under leave-one-speaker-out cross-validation; the Danish subset achieves an unweighted average recall (UAR) of 64.1%, matching DES’s performance and confirming the corpus’s validity and utility. This contribution provides both a high-quality benchmark dataset and a methodological framework for emotion recognition in low-resource languages.

Technology Category

Application Category

📝 Abstract
While multiple emotional speech corpora exist for commonly spoken languages, there is a lack of functional datasets for smaller (spoken) languages, such as Danish. To our knowledge, Danish Emotional Speech (DES), published in 1997, is the only other database of Danish emotional speech. We present EmoTale; a corpus comprising Danish and English speech recordings with their associated enacted emotion annotations. We demonstrate the validity of the dataset by investigating and presenting its predictive power using speech emotion recognition (SER) models. We develop SER models for EmoTale and the reference datasets using self-supervised speech model (SSLM) embeddings and the openSMILE feature extractor. We find the embeddings superior to the hand-crafted features. The best model achieves an unweighted average recall (UAR) of 64.1% on the EmoTale corpus using leave-one-speaker-out cross-validation, comparable to the performance on DES.
Problem

Research questions and friction points this paper is trying to address.

Lack of emotional speech datasets for Danish language
Need for validated emotional speech recognition models
Comparison of self-supervised versus hand-crafted feature performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed Danish-English emotional speech dataset
Used self-supervised speech model embeddings
Achieved 64.1% UAR with cross-validation
🔎 Similar Papers
No similar papers found.
M
Maja J. Hjuler
University Grenoble Alpes, CNRS, Grenoble INP, LIG, 38000 Grenoble, France
H
Harald V. Skat-Rørdam
Dept. of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Lyngby, Denmark
Line H. Clemmensen
Line H. Clemmensen
University of Copenhagen
Machine learningmultivariate statisticsstatistical modellingsparse modelling
S
Sneha Das
Dept. of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Lyngby, Denmark