Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal sentiment analysis methods overlook signals shared exclusively by subsets of modalities, limiting the expressiveness and discriminative power of learned representations. To address this, this work proposes a tri-subspace disentanglement framework that explicitly decomposes features into three complementary subspaces: globally shared, pairwise modality-shared, and modality-private. Subspace independence is enforced through disentanglement supervision and structural regularization. Furthermore, a Subspace-Aware Cross-Attention (SACA) module is introduced to enable fine-grained, adaptive fusion. This approach is the first to model multi-granularity cross-modal affective cues, achieving state-of-the-art performance on CMU-MOSI and CMU-MOSEI with MAE = 0.691 and ACC-7 = 54.9%, respectively. The framework also demonstrates successful transferability to multimodal intent recognition tasks.

Technology Category

Application Category

📝 Abstract
Multimodal Sentiment Analysis (MSA) integrates language, visual, and acoustic modalities to infer human sentiment. Most existing methods either focus on globally shared representations or modality-specific features, while overlooking signals that are shared only by certain modality pairs. This limits the expressiveness and discriminative power of multimodal representations. To address this limitation, we propose a Tri-Subspace Disentanglement (TSD) framework that explicitly factorizes features into three complementary subspaces: a common subspace capturing global consistency, submodally-shared subspaces modeling pairwise cross-modal synergies, and private subspaces preserving modality-specific cues. To keep these subspaces pure and independent, we introduce a decoupling supervisor together with structured regularization losses. We further design a Subspace-Aware Cross-Attention (SACA) fusion module that adaptively models and integrates information from the three subspaces to obtain richer and more robust representations. Experiments on CMU-MOSI and CMU-MOSEI demonstrate that TSD achieves state-of-the-art performance across all key metrics, reaching 0.691 MAE on CMU-MOSI and 54.9% ACC-7 on CMU-MOSEI, and also transfers well to multimodal intent recognition tasks. Ablation studies confirm that tri-subspace disentanglement and SACA jointly enhance the modeling of multi-granular cross-modal sentiment cues.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Sentiment Analysis
modality-specific features
cross-modal synergies
shared representations
tri-subspace disentanglement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-Subspace Disentanglement
Multimodal Sentiment Analysis
Subspace-Aware Cross-Attention
Modality Disentanglement
Cross-Modal Synergy
🔎 Similar Papers
No similar papers found.
Chunlei Meng
Chunlei Meng
Fudan University
Embodied Ai,Multimodal,Multi-agent
J
Jiabin Luo
Peking University
Z
Zhenglin Yan
Fudan University
Z
Zhenyu Yu
Fudan University
R
Rong Fu
University of Macau
Z
Zhongxue Gan
Fudan University
Chun Ouyang
Chun Ouyang
Associate Professor, PhD, Queensland University of Technology
Process MiningExplainable AIPredictive AnalyticsAI RobustnessMachine Learning