Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal multi-label emotion recognition (MMER) suffers from epistemic uncertainty induced by modality-specific noise, leading to feature ambiguity and ineffective multimodal fusion. To address this, we propose the first probabilistic distribution disentanglement framework operating in a latent affective space. Our approach decouples semantic representation learning from uncertainty modeling via a contrastive distribution disentanglement mechanism. We further introduce an uncertainty-aware distribution-level fusion strategy that jointly optimizes label correlations and modality reliability. The method integrates contrastive learning, probabilistic latent distribution modeling, and a variational inference–inspired disentanglement strategy. Extensive experiments demonstrate state-of-the-art performance on CMU-MOSEI and M$^3$ED benchmarks. Code is publicly available.

Technology Category

Application Category

📝 Abstract
Multimodal multi-label emotion recognition (MMER) aims to identify the concurrent presence of multiple emotions in multimodal data. Existing studies primarily focus on improving fusion strategies and modeling modality-to-label dependencies. However, they often overlook the impact of extbf{aleatoric uncertainty}, which is the inherent noise in the multimodal data and hinders the effectiveness of modality fusion by introducing ambiguity into feature representations. To address this issue and effectively model aleatoric uncertainty, this paper proposes Latent emotional Distribution Decomposition with Uncertainty perception (LDDU) framework from a novel perspective of latent emotional space probabilistic modeling. Specifically, we introduce a contrastive disentangled distribution mechanism within the emotion space to model the multimodal data, allowing for the extraction of semantic features and uncertainty. Furthermore, we design an uncertainty-aware fusion multimodal method that accounts for the dispersed distribution of uncertainty and integrates distribution information. Experimental results show that LDDU achieves state-of-the-art performance on the CMU-MOSEI and M$^3$ED datasets, highlighting the importance of uncertainty modeling in MMER. Code is available at https://github.com/201983290498/lddu_mmer.git.
Problem

Research questions and friction points this paper is trying to address.

Addressing aleatoric uncertainty in multimodal data affecting emotion recognition accuracy
Modeling latent emotional space probabilistically to disentangle semantic features and uncertainty
Developing uncertainty-aware fusion methods for improved multimodal emotion distribution integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic latent emotional space modeling for uncertainty decomposition
Contrastive disentangled mechanism extracts features and uncertainty
Uncertainty-aware fusion integrates dispersed distribution information
🔎 Similar Papers
No similar papers found.
J
Jingwang Huang
College of Computer Science, Chongqing University, China
J
Jiang Zhong
College of Computer Science, Chongqing University, China
Qin Lei
Qin Lei
College of Computer Science, Chongqing University, China
J
Jinpeng Gao
College of Computer Science, Chongqing University, China
Yuming Yang
Yuming Yang
Fudan University
Natural Language ProcessingLarge Language Models
Sirui Wang
Sirui Wang
Meituan
NLPLLM
Peiguang Li
Peiguang Li
Meituan Group
自然语言处理
K
Kaiwen Wei
College of Computer Science, Chongqing University, China