Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis

πŸ“… 2025-08-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Multimodal sentiment analysis (MSA) is vulnerable to spurious intra-modal and inter-modal correlations, causing models to rely on statistical shortcuts rather than causal relationships, thereby limiting generalization. To address this, we propose the Multi-Relational Multimodal Causal Intervention (MR-MCI) modelβ€”the first to introduce backdoor adjustment into MSA. MR-MCI explicitly models intra- and inter-modal dependencies via a multi-relational graph and employs a causal attention mechanism to disentangle bias-confounded features from causally relevant ones. Coupled with hierarchical feature decomposition and dynamic fusion, it suppresses non-causal pathways. Extensive experiments on multiple standard benchmarks and out-of-distribution test sets demonstrate that MR-MCI significantly reduces reliance on statistical shortcuts, achieving superior robustness and generalization. This work establishes a novel, interpretable, and causally grounded paradigm for multimodal modeling.

Technology Category

Application Category

πŸ“ Abstract
Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) model, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and inter-modal relations. Finally, by applying the backdoor adjustment, we stratify the shortcut features and dynamically combine them with the causal features to encourage MMCI to produce stable predictions under distribution shifts. Extensive experiments on several standard MSA datasets and out-of-distribution (OOD) test sets demonstrate that our method effectively suppresses biases and improves performance.
Problem

Research questions and friction points this paper is trying to address.

Address spurious correlations in multimodal sentiment analysis
Disentangle causal and shortcut features across modalities
Improve generalization under distribution shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages backdoor adjustment for causal intervention
Models multimodal inputs as multi-relational graph
Disentangles causal and shortcut features via attention
M
Menghua Jiang
School of Computer Science, South China Normal University
Y
Yuxia Lin
School of Computer Science, South China Normal University
B
Baoliang Chen
School of Computer Science, South China Normal University
Haifeng Hu
Haifeng Hu
Sun Yat-sen University
Yuncheng Jiang
Yuncheng Jiang
West China Hospital, Sichuan University
Computer VisionMedical Image Analysis
S
Sijie Mai
School of Computer Science, South China Normal University