Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales

๐Ÿ“… 2025-05-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of fine-grained semantic alignment and limited comprehension capacity of small language models (SLMs) in multimodal aspect-based sentiment analysis (MABSA), this paper proposes LRSAโ€”a novel collaborative framework. LRSA pioneers the integration of interpretable, LLM-generated rationales into SLM decision-making, enabling transparent and grounded predictions without end-to-end LLM training. It introduces a dual cross-attention mechanism to enhance bidirectional imageโ€“text feature interaction and alignment, overcoming inherent limitations of pure fine-tuning or prompt engineering. The framework achieves a favorable trade-off between interpretability and computational efficiency. Evaluated on three mainstream MABSA benchmarks, LRSA consistently outperforms state-of-the-art methods, yielding average F1-score improvements of 2.3โ€“4.1 percentage points. Moreover, it demonstrates strong generalizability across diverse pre-trained vision-language backbones, validating both its effectiveness and architectural versatility.

Technology Category

Application Category

๐Ÿ“ Abstract
There has been growing interest in Multimodal Aspect-Based Sentiment Analysis (MABSA) in recent years. Existing methods predominantly rely on pre-trained small language models (SLMs) to collect information related to aspects and sentiments from both image and text, with an aim to align these two modalities. However, small SLMs possess limited capacity and knowledge, often resulting in inaccurate identification of meaning, aspects, sentiments, and their interconnections in textual and visual data. On the other hand, Large language models (LLMs) have shown exceptional capabilities in various tasks by effectively exploring fine-grained information in multimodal data. However, some studies indicate that LLMs still fall short compared to fine-tuned small models in the field of ABSA. Based on these findings, we propose a novel framework, termed LRSA, which combines the decision-making capabilities of SLMs with additional information provided by LLMs for MABSA. Specifically, we inject explanations generated by LLMs as rationales into SLMs and employ a dual cross-attention mechanism for enhancing feature interaction and fusion, thereby augmenting the SLMs' ability to identify aspects and sentiments. We evaluated our method using two baseline models, numerous experiments highlight the superiority of our approach on three widely-used benchmarks, indicating its generalizability and applicability to most pre-trained models for MABSA.
Problem

Research questions and friction points this paper is trying to address.

Improving Multimodal Aspect-Based Sentiment Analysis accuracy
Enhancing small language models with LLM-generated rationales
Aligning textual and visual data for better sentiment identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated rationales enhance SLM decision-making
Dual cross-attention improves feature interaction
Combines SLM and LLM strengths for MABSA
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jun Cao
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
J
Jiyi Li
Faculty of Engineering, Graduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, Japan
Ziwei Yang
Ziwei Yang
Bioinformatics Center, Institute for Chemical Research, Kyoto University
BioinformaticsMachine LearningComputational BiologyBiomedical Data Science
Renjie Zhou
Renjie Zhou
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China