🤖 AI Summary
This work addresses the performance degradation of multimodal sentiment analysis in real-world scenarios caused by missing or corrupted modalities, a challenge that existing disentanglement methods struggle to handle due to their inability to effectively model the dynamic heterogeneity under uncertain modality loss. To this end, we propose the DERL framework, which leverages a mixture-of-experts mechanism to adaptively disentangle multimodal inputs into orthogonal private and shared representations. A multi-level reconstruction strategy is introduced to enable collaborative supervision, facilitating importance-aware robust fusion. Extensive experiments demonstrate that our approach significantly enhances representation capability and robustness under modality absence, achieving state-of-the-art performance on benchmarks such as MOSI—specifically, a 2.47% improvement in Acc-2 and a 2.25% reduction in MAE under intra-modality missing conditions.
📝 Abstract
Multimodal Sentiment Analysis (MSA) integrates multiple modalities to infer human sentiment, but real-world noise often leads to missing or corrupted data. However, existing feature-disentangled methods struggle to handle the internal variations of heterogeneous information under uncertain missingness, making it difficult to learn effective multimodal representations from degraded modalities. To address this issue, we propose DERL, a Disentangled Expert Representation Learning framework for robust MSA. Specifically, DERL employs hybrid experts to adaptively disentangle multimodal inputs into orthogonal private and shared representation spaces. A multi-level reconstruction strategy is further developed to provide collaborative supervision, enhancing both the expressiveness and robustness of the learned representations. Finally, the disentangled features act as modality experts with distinct roles to generate importance-aware fusion results. Extensive experiments on two MSA benchmarks demonstrate that DERL outperforms state-of-the-art methods under various missing-modality conditions. For instance, our method achieves improvements of 2.47% in Acc-2 and 2.25% in MAE on MOSI under intra-modal missingness.