Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current MLLM evaluation for visual affective understanding suffers from three key limitations: closed-set emotion classification, neglect of contextual and subjective factors, and reliance on costly human annotation. To address these, we propose the “affective sentence judgment” task and an automated construction framework, introducing the first open-vocabulary, multi-dimensional affective assessment paradigm. Our method features a low-effort sentence generation pipeline integrating affect-centered construction, zero-shot prompting for evaluation, and a multi-dimensional scoring mechanism—augmented with a human performance benchmark. Systematic experiments reveal that state-of-the-art MLLMs (e.g., GPT-4o) significantly underperform humans in contextualized affective reasoning, particularly in modeling subjectivity. This work establishes a scalable, interpretable, and cost-effective evaluation standard for assessing MLLMs’ affective capabilities.

Technology Category

Application Category

📝 Abstract
Recently, Multimodal Large Language Models (MLLMs) have achieved exceptional performance across diverse tasks, continually surpassing previous expectations regarding their capabilities. Nevertheless, their proficiency in perceiving emotions from images remains debated, with studies yielding divergent results in zero-shot scenarios. We argue that this inconsistency stems partly from constraints in existing evaluation methods, including the oversight of plausible responses, limited emotional taxonomies, neglect of contextual factors, and labor-intensive annotations. To facilitate customized visual emotion evaluation for MLLMs, we propose an Emotion Statement Judgment task that overcomes these constraints. Complementing this task, we devise an automated pipeline that efficiently constructs emotion-centric statements with minimal human effort. Through systematically evaluating prevailing MLLMs, our study showcases their stronger performance in emotion interpretation and context-based emotion judgment, while revealing relative limitations in comprehending perception subjectivity. When compared to humans, even top-performing MLLMs like GPT4o demonstrate remarkable performance gaps, underscoring key areas for future improvement. By developing a fundamental evaluation framework and conducting a comprehensive MLLM assessment, we hope this work contributes to advancing emotional intelligence in MLLMs. Project page: https://github.com/wdqqdw/MVEI.
Problem

Research questions and friction points this paper is trying to address.

Evaluating visual emotion perception in MLLMs
Overcoming limitations in emotion evaluation methods
Automating emotion-centric statement construction for assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for emotion-centric statement construction
Emotion Statement Judgment task for customized evaluation
Open-vocabulary scalable approach for visual emotion assessment
🔎 Similar Papers
2024-05-14IEEE/RJS International Conference on Intelligent RObots and SystemsCitations: 2