Mixture of Disentangled Experts with Missing Modalities for Robust Multimodal Sentiment Analysis

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of multimodal sentiment analysis in real-world scenarios caused by missing or corrupted modalities, a challenge that existing disentanglement methods struggle to handle due to their inability to effectively model the dynamic heterogeneity under uncertain modality loss. To this end, we propose the DERL framework, which leverages a mixture-of-experts mechanism to adaptively disentangle multimodal inputs into orthogonal private and shared representations. A multi-level reconstruction strategy is introduced to enable collaborative supervision, facilitating importance-aware robust fusion. Extensive experiments demonstrate that our approach significantly enhances representation capability and robustness under modality absence, achieving state-of-the-art performance on benchmarks such as MOSI—specifically, a 2.47% improvement in Acc-2 and a 2.25% reduction in MAE under intra-modality missing conditions.

Technology Category

Application Category

📝 Abstract
Multimodal Sentiment Analysis (MSA) integrates multiple modalities to infer human sentiment, but real-world noise often leads to missing or corrupted data. However, existing feature-disentangled methods struggle to handle the internal variations of heterogeneous information under uncertain missingness, making it difficult to learn effective multimodal representations from degraded modalities. To address this issue, we propose DERL, a Disentangled Expert Representation Learning framework for robust MSA. Specifically, DERL employs hybrid experts to adaptively disentangle multimodal inputs into orthogonal private and shared representation spaces. A multi-level reconstruction strategy is further developed to provide collaborative supervision, enhancing both the expressiveness and robustness of the learned representations. Finally, the disentangled features act as modality experts with distinct roles to generate importance-aware fusion results. Extensive experiments on two MSA benchmarks demonstrate that DERL outperforms state-of-the-art methods under various missing-modality conditions. For instance, our method achieves improvements of 2.47% in Acc-2 and 2.25% in MAE on MOSI under intra-modal missingness.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Sentiment Analysis
Missing Modalities
Feature Disentanglement
Robust Representation Learning
Heterogeneous Information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled Representation
Mixture of Experts
Missing Modalities
Multimodal Sentiment Analysis
Robust Fusion
Xiang Li
Xiang Li
Beihang University
multimodal sentiment analysismultimodal fusion
Xiaoming Zhang
Xiaoming Zhang
Beihang University
D
Dezhuang Miao
School of Cyber Science and Technology, Beihang University, Beijing, 100191, China
X
Xianfu Cheng
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
D
Dawei Li
School of Cyber Science and Technology, Beihang University, Beijing, 100191, China
H
Honggui Han
School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
Zhoujun Li
Zhoujun Li
Beihang University
Artificial IntelligentNatural Language ProcessingNetwork Security