🤖 AI Summary
Multimodal sentiment analysis suffers significant performance degradation under modality incompleteness (e.g., missing or degraded modalities), and existing methods often rely on predefined missing patterns, limiting generalizability. Method: This paper proposes a hierarchy-aware fusion framework centered on *modality completeness*. It first introduces a learnable completeness assessment module to quantitatively model the information completeness of each modality—a novel capability. Second, it designs a dual-depth validation mechanism that jointly optimizes cross-modal completion at both semantic and feature levels. Finally, it integrates completeness-weighted completion with attention-driven adaptive fusion, trained progressively under a multi-level loss objective. Contribution/Results: The framework achieves substantial improvements over state-of-the-art methods on mainstream benchmarks, demonstrating exceptional robustness—particularly in fine-grained sentiment classification—and strong potential for real-world deployment.
📝 Abstract
Multimodal Sentiment Analysis (MSA) is critical for human-computer interaction but faces challenges when the modalities are incomplete or missing. Existing methods often assume pre-defined missing modalities or fixed missing rates, limiting their real-world applicability. To address this challenge, we propose Senti-iFusion, an integrity-centered hierarchical fusion framework capable of handling both inter- and intra-modality missingness simultaneously. It comprises three hierarchical components: Integrity Estimation, Integrity-weighted Completion, and Integrity-guided Fusion. First, the Integrity Estimation module predicts the completeness of each modality and mitigates the noise caused by incomplete data. Second, the Integrity-weighted Cross-modal Completion module employs a novel weighting mechanism to disentangle consistent semantic structures from modality-specific representations, enabling the precise recovery of sentiment-related features across language, acoustic, and visual modalities. To ensure consistency in reconstruction, a dual-depth validation with semantic- and feature-level losses ensures consistent reconstruction at both fine-grained (low-level) and semantic (high-level) scales. Finally, the Integrity-guided Adaptive Fusion mechanism dynamically selects the dominant modality for attention-based fusion, ensuring that the most reliable modality, based on completeness and quality, contributes more significantly to the final prediction. Senti-iFusion employs a progressive training approach to ensure stable convergence. Experimental results on popular MSA datasets demonstrate that Senti-iFusion outperforms existing methods, particularly in fine-grained sentiment analysis tasks. The code and our proposed Senti-iFusion model will be publicly available.