Modality-Aware Contrastive and Uncertainty-Regularized Emotion Recognition

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenges in multimodal sentiment recognition arising from semantic discrepancies, uneven quality, and partial absence across modalities, which lead to inconsistent representations and undermine the reliability of emotion prediction. To mitigate these issues, the authors propose the MCUR framework, which innovatively integrates dual contrastive learning—operating over both modality combinations and emotion categories—to enhance cross-modal representation consistency. Additionally, an uncertainty-guided adaptive regularization mechanism is introduced at the sample level to dynamically adjust modality weights, thereby improving model robustness. Extensive experiments on the MOSI, MOSEI, and IEMOCAP datasets demonstrate significant performance gains, with average F1-score improvements of 2.2%, 2.67%, and 4.37%, respectively, outperforming current state-of-the-art methods.

📝 Abstract

Multimodal Emotion Recognition (MER) has attracted growing attention with the rapid advancement of human-computer interaction. However, different modalities exhibit substantial discrepancies in semantics, quality, and availability, leading to highly heterogeneous modality combinations and posing significant challenges to achieving consistent and reliable emotion understanding. To address this challenge, we propose the Modality-Aware Contrastive and Uncertainty-Regularized (MCUR) framework, which approaches MER from the perspective of representation consistency, aiming to enable robust emotion prediction across heterogeneous modality combinations. MCUR incorporates two core components: (1) Modality Combination-Based and Category-Based Contrastive Learning mechanism (MCB-CL), which encourages samples with the same emotion category and the same available modalities to be close in the representation space; and (2) Sample-wise Uncertainty-Guided Regularization (SUGR), which adaptively assigns sample-wise uncertain weights to samples to optimize training. Extensive experiments demonstrate that MCUR consistently outperforms existing methods, achieving average F1 gains of 2.2% on MOSI, 2.67% on MOSEI, and 4.37% on IEMOCAP.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Emotion Recognition

Modality Heterogeneity

Representation Consistency

Emotion Understanding

Modality Discrepancy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-Aware Contrastive Learning

Uncertainty-Regularized Learning

Multimodal Emotion Recognition