Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-corpus speech emotion recognition suffers from unstable acoustic features and limited generalizability due to speaker variability, domain shift, and heterogeneous recording conditions. To address this, we propose a contrastive learning framework anchored on articulatory mouth movements—introducing physiologically interpretable articulatory dynamics as the core alignment signal for cross-domain emotional representation, replacing conventional acoustic feature alignment. Our method integrates lip-motion modeling, acoustic-visual disentangled representation learning, and joint contrastive training across multiple corpora (CREMA-D and MSP-IMPROV). Experimental results demonstrate substantial improvements in cross-corpus emotion recognition accuracy, validating that mouth articulation provides a stable, consistent, and generalizable cue for emotion representation. This work establishes a novel paradigm for unsupervised cross-domain speech emotion recognition grounded in articulatory physiology.

Technology Category

Application Category

📝 Abstract
Cross-corpus speech emotion recognition (SER) plays a vital role in numerous practical applications. Traditional approaches to cross-corpus emotion transfer often concentrate on adapting acoustic features to align with different corpora, domains, or labels. However, acoustic features are inherently variable and error-prone due to factors like speaker differences, domain shifts, and recording conditions. To address these challenges, this study adopts a novel contrastive approach by focusing on emotion-specific articulatory gestures as the core elements for analysis. By shifting the emphasis on the more stable and consistent articulatory gestures, we aim to enhance emotion transfer learning in SER tasks. Our research leverages the CREMA-D and MSP-IMPROV corpora as benchmarks and it reveals valuable insights into the commonality and reliability of these articulatory gestures. The findings highlight mouth articulatory gesture potential as a better constraint for improving emotion recognition across different settings or domains.
Problem

Research questions and friction points this paper is trying to address.

Cross-corpus Speech Emotion Recognition
Speaker Variability
Environmental Changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mouth Movement Features
Cross-Corpus Emotion Recognition
Accuracy Improvement
🔎 Similar Papers
No similar papers found.
Shreya G. Upadhyay
Shreya G. Upadhyay
National Tsing Hua University
Machine LearningAffective ComputingBehavioral Speech Signal ProcessingSpeech Emotion
Ali N. Salman
Ali N. Salman
Department of Electrical and Computer Engineering, University of Texas at Dallas, USA
C
Carlos Busso
Department of Electrical and Computer Engineering, University of Texas at Dallas, USA; Language Technologies Institute, Carnegie Mellon University, USA
C
Chi-Chun Lee
Department of Electrical Engineering, National Tsing Hua University, Taiwan