EEmo-Logic: A Unified Dataset and Multi-Stage Framework for Comprehensive Image-Evoked Emotion Assessment

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to image-based emotion understanding are often limited to coarse-grained perception or lack fine-grained reasoning capabilities, making it difficult to capture the multidimensional nature and intensity variations of emotions. To address this gap, this work introduces EEmoDB, the largest-scale dataset for image emotion understanding to date, which uniquely integrates five key dimensions of emotion analysis. Furthermore, we propose EEmo-Logic, a multi-stage, multimodal large language model that leverages instruction tuning and a task-specific Grouped Relative Preference Optimization (GRPO) strategy to significantly enhance fine-grained emotion assessment and question-answering performance. Experimental results demonstrate that EEmo-Logic consistently outperforms existing methods under both in-domain and cross-domain evaluation settings.

Technology Category

Application Category

📝 Abstract
Understanding the multi-dimensional attributes and intensity nuances of image-evoked emotions is pivotal for advancing machine empathy and empowering diverse human-computer interaction applications. However, existing models are still limited to coarse-grained emotion perception or deficient reasoning capabilities. To bridge this gap, we introduce EEmoDB, the largest image-evoked emotion understanding dataset to date. It features $5$ analysis dimensions spanning $5$ distinct task categories, facilitating comprehensive interpretation. Specifically, we compile $1.2M$ question-answering (QA) pairs (EEmoDB-QA) from $125k$ images via automated generation, alongside a $36k$ dataset (EEmoDB-Assess) curated from $25k$ images for fine-grained assessment. Furthermore, we propose EEmo-Logic, an all-in-one multimodal large language model (MLLM) developed via instruction fine-tuning and task-customized group relative preference optimization (GRPO) with novel reward design. Extensive experiments demonstrate that EEmo-Logic achieves robust performance in in-domain and cross-domain datasets, excelling in emotion QA and fine-grained assessment. The code is available at https://anonymous.4open.science/r/EEmoLogic.
Problem

Research questions and friction points this paper is trying to address.

image-evoked emotion
emotion understanding
multi-dimensional attributes
fine-grained assessment
machine empathy
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language model
emotion understanding dataset
group relative preference optimization
fine-grained emotion assessment
image-evoked emotion
Lancheng Gao
Lancheng Gao
Shanghai Jiaotong University
Ziheng Jia
Ziheng Jia
Shanghai Jiaotong University / Shanghai AILab
LLM and LMM on Visual Quality Assessment
Z
Zixuan Xing
Institute of Image Communication and Network Engineering, Shanghai Key Laboratory of Digital Media Processing and Transmissions, Shanghai Jiao Tong University, Shanghai
W
Wei Sun
School of Communication & Electronic Engineering, East China Normal University, Shanghai
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays
X
Xiongkuo Min
Institute of Image Communication and Network Engineering, Shanghai Key Laboratory of Digital Media Processing and Transmissions, Shanghai Jiao Tong University, Shanghai