Leveraging Self-Paced Curriculum Learning for Enhanced Modality Balance in Multimodal Conversational Emotion Recognition

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Multimodal conversational emotion recognition is often hindered by modality misalignment and imbalanced learning, which impede effective fusion. To address these challenges, this work proposes a plug-and-play Self-Paced Curriculum Learning (SPCL) framework that dynamically schedules training samples based on a dual-level difficulty assessment mechanism operating at both utterance and conversation levels. By guiding the model to learn progressively from easier to harder examples, SPCL enhances modality balance and robustness. Designed as a lightweight module, it integrates seamlessly into existing architectures with high generality. Experiments on the IEMOCAP and MELD datasets demonstrate significant improvements, achieving gains of 6.6% and 10.4% in weighted F1 score, respectively, thereby validating the method’s effectiveness and strong generalization capability.

📝 Abstract

Multimodal Emotion Recognition in Conversations (MERC) is a crucial task for understanding human interactions, where multimodal approaches integrating language, facial expressions, and vocal tone have achieved significant progress. However, modality misalignment and imbalanced learning remain major challenges, limiting the effective utilization of multimodal information. To address this issue, we propose a plug-and-play framework based on Self-Paced Curriculum Learning (SPCL) for MERC. We introduce a dual-level Difficulty Measurer that captures both utterance-level and conversation-level challenges. The utterance-level score models fine-grained modality-specific difficulty, while the conversation-level score captures broader dialogue structures, including emotional dependencies and modality coherence. Based on these scores, the Learning Scheduler dynamically guides training from easier to more difficult instances. By integrating SPCL into existing MERC architectures, our method alleviates modality imbalance and improves model robustness. Extensive experiments on the IEMOCAP and MELD datasets demonstrate consistent improvements across different architectures and modality settings. On IEMOCAP, SPCL improves weighted F1-score by approximately +1.2% to +6.6% over baseline models, while on MELD, gains reach up to +10.4%. These results highlight the effectiveness and generalizability of SPCL as a lightweight plug-and-play module for multimodal emotion recognition.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Emotion Recognition

Modality Imbalance

Modality Misalignment

Conversational Emotion Recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Paced Curriculum Learning

Multimodal Emotion Recognition

Modality Balance

Difficulty Measurer