🤖 AI Summary
This work addresses the challenge of feature misalignment in multimodal sentiment analysis caused by missing modalities in real-world scenarios, which degrades representation quality. To mitigate this issue, the authors propose a Progressive Representation Learning Framework (PRLF) that dynamically assesses modality reliability through an adaptive estimator grounded in recognition confidence and Fisher information. Leveraging the most reliable (dominant) modality as an anchor, PRLF employs a progressive cross-modal interaction mechanism to iteratively align representations from other modalities, thereby effectively suppressing noise and enhancing robustness. Extensive experiments on the CMU-MOSI, CMU-MOSEI, and SIMS datasets demonstrate that PRLF consistently outperforms state-of-the-art methods under both intra- and inter-modality missing conditions, achieving superior performance across all evaluated settings.
📝 Abstract
Multimodal Sentiment Analysis (MSA) seeks to infer human emotions by integrating textual, acoustic, and visual cues. However, existing approaches often rely on all modalities are completeness, whereas real-world applications frequently encounter noise, hardware failures, or privacy restrictions that result in missing modalities. There exists a significant feature misalignment between incomplete and complete modalities, and directly fusing them may even distort the well-learned representations of the intact modalities. To this end, we propose PRLF, a Progressive Representation Learning Framework designed for MSA under uncertain missing-modality conditions. PRLF introduces an Adaptive Modality Reliability Estimator (AMRE), which dynamically quantifies the reliability of each modality using recognition confidence and Fisher information to determine the dominant modality. In addition, the Progressive Interaction (ProgInteract) module iteratively aligns the other modalities with the dominant one, thereby enhancing cross-modal consistency while suppressing noise. Extensive experiments on CMU-MOSI, CMU-MOSEI, and SIMS verify that PRLF outperforms state-of-the-art methods across both inter- and intra-modality missing scenarios, demonstrating its robustness and generalization capability.