MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing domain generalization (DG) methods for multimodal sentiment analysis (MSA) neglect inter-modal collaboration, hindering the modeling of cross-modal semantic invariance; knowledge injection is further constrained by modal fragmentation, leading to representational disintegration. To address these issues, we propose the first DG framework integrating collaborative invariant feature extraction with fine-grained cross-modal knowledge injection. Specifically, we design an Invariant Mixture-of-Experts mechanism to explicitly model inter-modal collaborative invariance; introduce Cross-Modal Adapters to align and fuse knowledge beyond modal boundaries; and jointly employ feature disentanglement and end-to-end optimization to enhance semantic consistency and generalization robustness. Evaluated on three standard benchmarks, our method achieves new state-of-the-art performance, improving average accuracy by 3.2% over prior approaches—demonstrating the efficacy of collaborative modeling and cross-modal knowledge integration for MSA domain generalization.

Technology Category

Application Category

📝 Abstract
Existing methods in domain generalization for Multimodal Sentiment Analysis (MSA) often overlook inter-modal synergies during invariant features extraction, which prevents the accurate capture of the rich semantic information within multimodal data. Additionally, while knowledge injection techniques have been explored in MSA, they often suffer from fragmented cross-modal knowledge, overlooking specific representations that exist beyond the confines of unimodal. To address these limitations, we propose a novel MSA framework designed for domain generalization. Firstly, the framework incorporates a Mixture of Invariant Experts model to extract domain-invariant features, thereby enhancing the model's capacity to learn synergistic relationships between modalities. Secondly, we design a Cross-Modal Adapter to augment the semantic richness of multimodal representations through cross-modal knowledge injection. Extensive domain experiments conducted on three datasets demonstrate that the proposed MIDG achieves superior performance.
Problem

Research questions and friction points this paper is trying to address.

Extracts domain-invariant features to capture inter-modal synergies.
Enhances multimodal semantic richness via cross-modal knowledge injection.
Improves domain generalization in multimodal sentiment analysis.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Invariant Experts extracts domain-invariant multimodal features
Cross-Modal Adapter injects knowledge to enrich semantic representations
Framework enhances inter-modal synergies for domain generalization
Y
Yangle Li
School of Electronics and Information Technology, Sun Yat-Sen University
Danli Luo
Danli Luo
School of Electronics and Information Technology, Sun Yat-Sen University
Haifeng Hu
Haifeng Hu
Sun Yat-sen University