π€ AI Summary
This work addresses the lack of explicit and trustworthy natural language explanations in existing multimodal aspect-based sentiment analysis methods, which often struggle to balance accuracy and interpretability. The study reformulates the task as a generative explainable problem and, for the first time, leverages a multimodal large language model with prompt learning to jointly generate sentiment polarities and aspect-specific explanatory texts. To enhance the modelβs reasoning over aspect-relevant sentiment cues, the authors introduce an innovative dependency syntax-guided strategy. Experimental results demonstrate that the proposed approach not only improves classification accuracy but also produces explanations that are more faithful to the input and specifically tailored to the target aspect.
π Abstract
Multimodal aspect-based sentiment analysis (MABSA) aims to identify aspect-level sentiments by jointly modeling textual and visual information, which is essential for fine-grained opinion understanding in social media. Existing approaches mainly rely on discriminative classification with complex multimodal fusion, yet lacking explicit sentiment explainability. In this paper, we reformulate MABSA as a generative and explainable task, proposing a unified framework that simultaneously predicts aspect-level sentiment and generates natural language explanations. Based on multimodal large language models (MLLMs), our approach employs a prompt-based generative paradigm, jointly producing sentiment and explanation. To further enhance aspect-oriented reasoning capabilities, we propose a dependency-syntax-guided sentiment cue strategy. This strategy prunes and textualizes the aspect-centered dependency syntax tree, guiding the model to distinguish different sentiment aspects and enhancing its explainability. To enable explainability, we use MLLMs to construct new datasets with sentiment explanations to fine-tune. Experiments show that our approach not only achieves consistent gains in sentiment classification accuracy, but also produces faithful, aspect-grounded explanations.