Controlling Decision Drift in Multimodal Sentiment Analysis with Missing Modalities

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This work addresses the challenges of multimodal sentiment analysis, where missing or low-quality modalities often induce feature distribution shifts and decision instability. To mitigate these issues, the authors propose a two-level reference alignment framework that introduces stable references at both the feature representation and sentiment decision stages. Specifically, complete-modality samples guide representation learning, while a prototype retrieval and voting mechanism suppresses the influence of unreliable modalities, thereby enforcing cross-modal consistency. This approach is the first to incorporate reference alignment at dual levels, significantly enhancing model robustness and generalization. Experimental results on the CMU-MOSI and CMU-MOSEI datasets demonstrate state-of-the-art performance under full-modality settings, achieving accuracies of 86.28% and 85.88% and F1 scores of 86.24% and 85.86%, respectively, with consistent improvements across various modality-missing scenarios.
📝 Abstract
Multimodal sentiment analysis relies on textual, acoustic, and visual signals, yet real-world data often suffer from modality missing and quality imbalance. Existing methods generate features for modality missing from available ones, but differences in expression mechanisms and sentiment dynamics across modalities may cause the generated features to deviate from true distributions and mislead prediction. In addition, unreliable modalities may dominate fusion, resulting in representation shift across modality combinations and unstable sentiment representations. To address these challenges, we propose a two-level reference alignment framework. The framework introduces stable references at the feature representation and sentiment decision levels to improve robustness under modality missing. First-level reference alignment leverages complete-modality samples to constrain representations and align different modality combinations into a shared sentiment space. Second-level reference alignment enforces cross-modal consistency at the decision level by suppressing unreliable modalities through prototype retrieval and voting. As a result, the framework maintains stable and reliable sentiment predictions under diverse missing-modality patterns. Experiments on CMU-MOSI and CMU-MOSEI show consistent improvements across various missing-modality settings. Under full-modality input, the proposed method achieves state-of-the-art performance, with ACC of 86.28% and 85.88%, and F1 of 86.24% and 85.86%.
Problem

Research questions and friction points this paper is trying to address.

multimodal sentiment analysis
modality missing
decision drift
representation shift
sentiment dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

reference alignment
modality missing
multimodal sentiment analysis
decision drift
prototype retrieval
C
Chenglizhao Chen
Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China); Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software
Yuchen Cao
Yuchen Cao
Carnegie Mellon University
Spatial ComputingComputer VisionArtificial IntelligenceExtended Reality
Xinyu Liu
Xinyu Liu
Harbin Institute of Technology, China
Biped walking robotAutomationControlMechatronics
M
Mengke Song
Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China); Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software
Guisheng Zhang
Guisheng Zhang
School of Electrical and Electronic Engineering, Shandong University of Technology
Deepfake detectionDeep learning
X
Xiaomin Yu
The Hong Kong University of Science and Technology (Guangzhou)