🤖 AI Summary
Multimodal aspect-based sentiment analysis (MABSA) faces two key challenges: strong visual noise interference and difficulty in fine-grained cross-modal alignment. To address these, this paper proposes a gated multimodal LSTM architecture comprising three specialized modules: Syn-mLSTM (for modeling syntactic dependency structures), Sem-mLSTM (to strengthen semantic associations between aspects and textual context), and Fuse-mLSTM (to enable selective vision-language alignment). A hierarchical gating fusion mechanism adaptively suppresses redundant visual signals while precisely capturing fine-grained cross-modal relationships between aspect terms and their corresponding opinion expressions. Evaluated on two Twitter benchmark datasets, the proposed method achieves significant improvements over existing state-of-the-art models, demonstrating superior robustness to visual noise and enhanced capability for cross-modal alignment.
📝 Abstract
Aspect-based Sentiment Analysis (ABSA) has recently advanced into the multimodal domain, where user-generated content often combines text and images. However, existing multimodal ABSA (MABSA) models struggle to filter noisy visual signals, and effectively align aspects with opinion-bearing content across modalities. To address these challenges, we propose GateMABSA, a novel gated multimodal architecture that integrates syntactic, semantic, and fusion-aware mLSTM. Specifically, GateMABSA introduces three specialized mLSTMs: Syn-mLSTM to incorporate syntactic structure, Sem-mLSTM to emphasize aspect--semantic relevance, and Fuse-mLSTM to perform selective multimodal fusion. Extensive experiments on two benchmark Twitter datasets demonstrate that GateMABSA outperforms several baselines.