Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the challenges of fine-grained semantic alignment and limited comprehension capacity of small language models (SLMs) in multimodal aspect-based sentiment analysis (MABSA), this paper proposes LRSA—a novel collaborative framework. LRSA pioneers the integration of interpretable, LLM-generated rationales into SLM decision-making, enabling transparent and grounded predictions without end-to-end LLM training. It introduces a dual cross-attention mechanism to enhance bidirectional image–text feature interaction and alignment, overcoming inherent limitations of pure fine-tuning or prompt engineering. The framework achieves a favorable trade-off between interpretability and computational efficiency. Evaluated on three mainstream MABSA benchmarks, LRSA consistently outperforms state-of-the-art methods, yielding average F1-score improvements of 2.3–4.1 percentage points. Moreover, it demonstrates strong generalizability across diverse pre-trained vision-language backbones, validating both its effectiveness and architectural versatility.

Technology Category

Application Category

📝 Abstract

There has been growing interest in Multimodal Aspect-Based Sentiment Analysis (MABSA) in recent years. Existing methods predominantly rely on pre-trained small language models (SLMs) to collect information related to aspects and sentiments from both image and text, with an aim to align these two modalities. However, small SLMs possess limited capacity and knowledge, often resulting in inaccurate identification of meaning, aspects, sentiments, and their interconnections in textual and visual data. On the other hand, Large language models (LLMs) have shown exceptional capabilities in various tasks by effectively exploring fine-grained information in multimodal data. However, some studies indicate that LLMs still fall short compared to fine-tuned small models in the field of ABSA. Based on these findings, we propose a novel framework, termed LRSA, which combines the decision-making capabilities of SLMs with additional information provided by LLMs for MABSA. Specifically, we inject explanations generated by LLMs as rationales into SLMs and employ a dual cross-attention mechanism for enhancing feature interaction and fusion, thereby augmenting the SLMs' ability to identify aspects and sentiments. We evaluated our method using two baseline models, numerous experiments highlight the superiority of our approach on three widely-used benchmarks, indicating its generalizability and applicability to most pre-trained models for MABSA.

Problem

Research questions and friction points this paper is trying to address.

Improving Multimodal Aspect-Based Sentiment Analysis accuracy

Enhancing small language models with LLM-generated rationales

Aligning textual and visual data for better sentiment identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated rationales enhance SLM decision-making

Dual cross-attention improves feature interaction

Combines SLM and LLM strengths for MABSA

🔎 Similar Papers

No similar papers found.