🤖 AI Summary
This work addresses the representational imbalance between shared and private modalities in multimodal sentiment analysis, which often undermines modality-specific discriminability and cross-modal complementarity. To mitigate this issue, the authors propose a dual-branch rebalancing framework that integrates three key components: Temporal-Structural Factorization (TSF) to suppress redundant shared representations, Anchor-Guided Private Routing (AGPR) to enhance the discriminative capacity of private features, and Bidirectional Rebalancing Fusion (BRF) for context-aware representation integration. This approach is the first to systematically alleviate the imbalance between shared and private branches. Extensive experiments on CMU-MOSI, CMU-MOSEI, and MIntRec demonstrate substantial performance gains over current state-of-the-art baselines, confirming the efficacy of the proposed rebalancing mechanism in advancing multimodal sentiment analysis.
📝 Abstract
Multimodal Sentiment Analysis (MSA) requires integrating language, acoustic, and visual signals without sacrificing modality-specific sentiment evidence. Existing methods mainly improve either shared-private decomposition or cross-modal interaction. Although effective, both ultimately depend on how shared and modality-specific evidence is organized before prediction. We observe that, under standard shared-private pipelines, modality heterogeneity often induces a branch-imbalance process: dominant shared patterns accumulate in the shared branch, yielding redundant and modality-biased evidence, while repeated interaction and rigid alignment gradually leak shared information into modality-specific channels and weaken discriminative private representations. As a result, the complementarity between shared and private representations is reduced, limiting robust sentiment reasoning. To address this issue, we propose the Dual-Branch Rebalancing Framework (DBR) on top of a standard multimodal decoupling stage. In the shared branch, a Temporal-Structural Factorization (TSF) module disentangles temporal evolution from structural dependencies and adaptively integrates them to reduce shared redundancy. In the private branch, an Anchor-Guided Private Routing (AGPR) module preserves discriminative modality-specific patterns while allowing controlled cross-modal borrowing. A Bidirectional Rebalancing Fusion (BRF) module then reunifies the two regularized branches in a context-aware manner for final prediction. Extensive experiments on CMU-MOSI, CMU-MOSEI, and MIntRec demonstrate that DBR consistently outperforms the compared baselines. Further analyses show that these improvements come from coordinated mitigation of branch imbalance.