An Empirical Study on Configuring In-Context Learning Demonstrations for Unleashing MLLMs' Sentimental Perception Capability

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Multimodal large language models (MLLMs) excel in zero-shot multimodal tasks but exhibit significant performance degradation on multimodal sentiment analysis (MSA), revealing structural deficiencies in affective perception. Method: This work presents the first systematic investigation of intrinsic biases in MLLM-based sentiment prediction, proposing an interpretable and reusable in-context learning (ICL) configuration paradigm grounded in three dimensions: example retrieval, presentation format, and distributional structure. The framework integrates retrieval augmentation, dynamic example ranking, structured prompting, and bias calibration. Results: Extensive experiments across six benchmark MSA datasets demonstrate that our approach achieves an average accuracy gain of 15.9% over zero-shot baselines and 11.2% over random ICL baselines, substantially narrowing the sentiment understanding gap between unsupervised MLLMs and supervised models.

Technology Category

Application Category

📝 Abstract

The advancements in Multimodal Large Language Models (MLLMs) have enabled various multimodal tasks to be addressed under a zero-shot paradigm. This paradigm sidesteps the cost of model fine-tuning, emerging as a dominant trend in practical application. Nevertheless, Multimodal Sentiment Analysis (MSA), a pivotal challenge in the quest for general artificial intelligence, fails to accommodate this convenience. The zero-shot paradigm exhibits undesirable performance on MSA, casting doubt on whether MLLMs can perceive sentiments as competent as supervised models. By extending the zero-shot paradigm to In-Context Learning (ICL) and conducting an in-depth study on configuring demonstrations, we validate that MLLMs indeed possess such capability. Specifically, three key factors that cover demonstrations' retrieval, presentation, and distribution are comprehensively investigated and optimized. A sentimental predictive bias inherent in MLLMs is also discovered and later effectively counteracted. By complementing each other, the devised strategies for three factors result in average accuracy improvements of 15.9% on six MSA datasets against the zero-shot paradigm and 11.2% against the random ICL baseline.

Problem

Research questions and friction points this paper is trying to address.

Optimizing In-Context Learning demonstrations for MLLMs' sentiment analysis

Addressing zero-shot performance gaps in Multimodal Sentiment Analysis

Counteracting inherent sentimental bias in Multimodal Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends zero-shot to In-Context Learning for MSA

Optimizes retrieval, presentation, and distribution of demonstrations

Counteracts MLLMs' inherent sentimental predictive bias

🔎 Similar Papers

No similar papers found.

Authors to Follow