EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Visual Emotion Analysis (VEA) datasets suffer from limited openness and poor interpretability, offering only image-level discrete labels—thus failing to reveal how specific visual elements drive emotional responses. Method: We introduce EmoVerse, the first large-scale open-source VEA dataset (219K images) supporting dual-space (categorical + dimensional) emotion modeling. It innovatively adopts a Background-Attribute-Subject (B-A-S) triplet structure and knowledge-graph-inspired hierarchical annotations to enable fine-grained emotion decomposition and word-level/subject-level visual grounding. A multi-stage multimodal large language model (MLLM)-based automatic annotation pipeline—validated by human experts—maps visual cues to continuous emotion spaces and generates natural-language attribution explanations. Contribution/Results: EmoVerse significantly enhances transparency and interpretability in VEA, establishing a foundational resource for high-level affective computing and explainable AI-driven emotion understanding.

Technology Category

Application Category

📝 Abstract
Visual Emotion Analysis (VEA) aims to bridge the affective gap between visual content and human emotional responses. Despite its promise, progress in this field remains limited by the lack of open-source and interpretable datasets. Most existing studies assign a single discrete emotion label to an entire image, offering limited insight into how visual elements contribute to emotion. In this work, we introduce EmoVerse, a large-scale open-source dataset that enables interpretable visual emotion analysis through multi-layered, knowledge-graph-inspired annotations. By decomposing emotions into Background-Attribute-Subject (B-A-S) triplets and grounding each element to visual regions, EmoVerse provides word-level and subject-level emotional reasoning. With over 219k images, the dataset further includes dual annotations in Categorical Emotion States (CES) and Dimensional Emotion Space (DES), facilitating unified discrete and continuous emotion representation. A novel multi-stage pipeline ensures high annotation reliability with minimal human effort. Finally, we introduce an interpretable model that maps visual cues into DES representations and provides detailed attribution explanations. Together, the dataset, pipeline, and model form a comprehensive foundation for advancing explainable high-level emotion understanding.
Problem

Research questions and friction points this paper is trying to address.

Addresses the lack of interpretable datasets for visual emotion analysis
Explains how visual elements contribute to emotional responses in images
Enables unified discrete and continuous emotion representation modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-layered knowledge-graph-inspired emotion annotations
Dual categorical and dimensional emotion representation system
Visual-to-emotion mapping model with attribution explanations
🔎 Similar Papers
No similar papers found.
Y
Yijie Guo
University of Science and Technology of China
Dexiang Hong
Dexiang Hong
Bytedance.Inc
Computer VisionDeep LearningDiffusion Model
W
Weidong Chen
University of Science and Technology of China
Z
Zihan She
University of Science and Technology of China
C
Cheng Ye
University of Science and Technology of China
X
Xiaojun Chang
University of Science and Technology of China
Zhendong Mao
Zhendong Mao
University of Science and Technology of China
CV,NLP