π€ AI Summary
Multimodal sentiment analysis (MSA) is vulnerable to spurious intra-modal and inter-modal correlations, causing models to rely on statistical shortcuts rather than causal relationships, thereby limiting generalization. To address this, we propose the Multi-Relational Multimodal Causal Intervention (MR-MCI) modelβthe first to introduce backdoor adjustment into MSA. MR-MCI explicitly models intra- and inter-modal dependencies via a multi-relational graph and employs a causal attention mechanism to disentangle bias-confounded features from causally relevant ones. Coupled with hierarchical feature decomposition and dynamic fusion, it suppresses non-causal pathways. Extensive experiments on multiple standard benchmarks and out-of-distribution test sets demonstrate that MR-MCI significantly reduces reliance on statistical shortcuts, achieving superior robustness and generalization. This work establishes a novel, interpretable, and causally grounded paradigm for multimodal modeling.
π Abstract
Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) model, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and inter-modal relations. Finally, by applying the backdoor adjustment, we stratify the shortcut features and dynamically combine them with the causal features to encourage MMCI to produce stable predictions under distribution shifts. Extensive experiments on several standard MSA datasets and out-of-distribution (OOD) test sets demonstrate that our method effectively suppresses biases and improves performance.