Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Multimodal sentiment analysis (MSA) is vulnerable to spurious intra-modal and inter-modal correlations, causing models to rely on statistical shortcuts rather than causal relationships, thereby limiting generalization. To address this, we propose the Multi-Relational Multimodal Causal Intervention (MR-MCI) model—the first to introduce backdoor adjustment into MSA. MR-MCI explicitly models intra- and inter-modal dependencies via a multi-relational graph and employs a causal attention mechanism to disentangle bias-confounded features from causally relevant ones. Coupled with hierarchical feature decomposition and dynamic fusion, it suppresses non-causal pathways. Extensive experiments on multiple standard benchmarks and out-of-distribution test sets demonstrate that MR-MCI significantly reduces reliance on statistical shortcuts, achieving superior robustness and generalization. This work establishes a novel, interpretable, and causally grounded paradigm for multimodal modeling.

Technology Category

Application Category

📝 Abstract

Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) model, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and inter-modal relations. Finally, by applying the backdoor adjustment, we stratify the shortcut features and dynamically combine them with the causal features to encourage MMCI to produce stable predictions under distribution shifts. Extensive experiments on several standard MSA datasets and out-of-distribution (OOD) test sets demonstrate that our method effectively suppresses biases and improves performance.

Problem

Research questions and friction points this paper is trying to address.

Address spurious correlations in multimodal sentiment analysis

Disentangle causal and shortcut features across modalities

Improve generalization under distribution shifts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages backdoor adjustment for causal intervention

Models multimodal inputs as multi-relational graph

Disentangles causal and shortcut features via attention

🔎 Similar Papers

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach