Switchboard-Affect: Emotion Perception Labels from Conversational Speech

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing speech emotion recognition (SER) models are predominantly trained and evaluated on acted speech (e.g., podcasts), leading to emotional expression distortion; moreover, crowdsourced annotations often lack transparent, standardized guidelines, hindering model interpretability and targeted improvement. Method: Leveraging the Switchboard corpus of natural conversational speech, we construct SWB-Affect—the first systematically curated, highly transparent affective annotation set for spontaneous dialogue—covering 10 discrete emotions and three continuous affective dimensions (valence, arousal, dominance). We introduce explicit, reproducible annotation protocols integrating categorical and dimensional frameworks, and quantitatively analyze linguistic and paralinguistic cues’ contributions to emotion perception. Contribution/Results: We publicly release SWB-Affect. Experiments reveal substantial performance degradation of state-of-the-art SER models on natural speech—particularly for anger—validating the critical importance of real-world conversational data for robustness evaluation.

Technology Category

Application Category

📝 Abstract

Understanding the nuances of speech emotion dataset curation and labeling is essential for assessing speech emotion recognition (SER) model potential in real-world applications. Most training and evaluation datasets contain acted or pseudo-acted speech (e.g., podcast speech) in which emotion expressions may be exaggerated or otherwise intentionally modified. Furthermore, datasets labeled based on crowd perception often lack transparency regarding the guidelines given to annotators. These factors make it difficult to understand model performance and pinpoint necessary areas for improvement. To address this gap, we identified the Switchboard corpus as a promising source of naturalistic conversational speech, and we trained a crowd to label the dataset for categorical emotions (anger, contempt, disgust, fear, sadness, surprise, happiness, tenderness, calmness, and neutral) and dimensional attributes (activation, valence, and dominance). We refer to this label set as Switchboard-Affect (SWB-Affect). In this work, we present our approach in detail, including the definitions provided to annotators and an analysis of the lexical and paralinguistic cues that may have played a role in their perception. In addition, we evaluate state-of-the-art SER models, and we find variable performance across the emotion categories with especially poor generalization for anger. These findings underscore the importance of evaluation with datasets that capture natural affective variations in speech. We release the labels for SWB-Affect to enable further analysis in this domain.

Problem

Research questions and friction points this paper is trying to address.

Addressing exaggerated emotions in acted speech datasets

Providing transparent emotion annotation guidelines for crowd labeling

Evaluating SER models on naturalistic conversational emotion data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used naturalistic conversational speech from Switchboard corpus

Trained crowd to label categorical emotions and dimensional attributes

Released SWB-Affect labels for transparent emotion analysis

🔎 Similar Papers

No similar papers found.