Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Current MLLM evaluation for visual affective understanding suffers from three key limitations: closed-set emotion classification, neglect of contextual and subjective factors, and reliance on costly human annotation. To address these, we propose the “affective sentence judgment” task and an automated construction framework, introducing the first open-vocabulary, multi-dimensional affective assessment paradigm. Our method features a low-effort sentence generation pipeline integrating affect-centered construction, zero-shot prompting for evaluation, and a multi-dimensional scoring mechanism—augmented with a human performance benchmark. Systematic experiments reveal that state-of-the-art MLLMs (e.g., GPT-4o) significantly underperform humans in contextualized affective reasoning, particularly in modeling subjectivity. This work establishes a scalable, interpretable, and cost-effective evaluation standard for assessing MLLMs’ affective capabilities.

Technology Category

Application Category

📝 Abstract

Recently, Multimodal Large Language Models (MLLMs) have achieved exceptional performance across diverse tasks, continually surpassing previous expectations regarding their capabilities. Nevertheless, their proficiency in perceiving emotions from images remains debated, with studies yielding divergent results in zero-shot scenarios. We argue that this inconsistency stems partly from constraints in existing evaluation methods, including the oversight of plausible responses, limited emotional taxonomies, neglect of contextual factors, and labor-intensive annotations. To facilitate customized visual emotion evaluation for MLLMs, we propose an Emotion Statement Judgment task that overcomes these constraints. Complementing this task, we devise an automated pipeline that efficiently constructs emotion-centric statements with minimal human effort. Through systematically evaluating prevailing MLLMs, our study showcases their stronger performance in emotion interpretation and context-based emotion judgment, while revealing relative limitations in comprehending perception subjectivity. When compared to humans, even top-performing MLLMs like GPT4o demonstrate remarkable performance gaps, underscoring key areas for future improvement. By developing a fundamental evaluation framework and conducting a comprehensive MLLM assessment, we hope this work contributes to advancing emotional intelligence in MLLMs. Project page: https://github.com/wdqqdw/MVEI.

Problem

Research questions and friction points this paper is trying to address.

Evaluating visual emotion perception in MLLMs

Overcoming limitations in emotion evaluation methods

Automating emotion-centric statement construction for assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for emotion-centric statement construction

Emotion Statement Judgment task for customized evaluation

Open-vocabulary scalable approach for visual emotion assessment

🔎 Similar Papers

OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition