Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

๐Ÿ“… 2025-04-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing sentiment analysis primarily focuses on emotion classification while neglecting the underlying causes of emotional responses. Method: This paper introduces Emotion Interpretation (EI), a novel task aimed at identifying both explicit (e.g., objects, interactions) and implicit (e.g., cultural context, off-screen events) causal factors driving emotions. To support EI, we construct EIBenchโ€”the first large-scale EI benchmark (1,665 samples)โ€”and propose the Coarse-to-Fine Self-Asking (CFSA) annotation paradigm, leveraging Vision-Language Large Models (VLLMs) for high-quality causal labeling. Our approach integrates multimodal large language models, causal reasoning frameworks, and cross-modal alignment techniques. Contribution/Results: Experiments reveal significant performance bottlenecks of mainstream models on EI, especially in complex causal reasoning scenarios. Both EIBench and CFSA are publicly released, establishing a new foundation for empathetic, context-aware AI systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Most existing emotion analysis emphasizes which emotion arises (e.g., happy, sad, angry) but neglects the deeper why. We propose Emotion Interpretation (EI), focusing on causal factors-whether explicit (e.g., observable objects, interpersonal interactions) or implicit (e.g., cultural context, off-screen events)-that drive emotional responses. Unlike traditional emotion recognition, EI tasks require reasoning about triggers instead of mere labeling. To facilitate EI research, we present EIBench, a large-scale benchmark encompassing 1,615 basic EI samples and 50 complex EI samples featuring multifaceted emotions. Each instance demands rationale-based explanations rather than straightforward categorization. We further propose a Coarse-to-Fine Self-Ask (CFSA) annotation pipeline, which guides Vision-Language Models (VLLMs) through iterative question-answer rounds to yield high-quality labels at scale. Extensive evaluations on open-source and proprietary large language models under four experimental settings reveal consistent performance gaps-especially for more intricate scenarios-underscoring EI's potential to enrich empathetic, context-aware AI applications. Our benchmark and methods are publicly available at: https://github.com/Lum1104/EIBench, offering a foundation for advanced multimodal causal analysis and next-generation affective computing.
Problem

Research questions and friction points this paper is trying to address.

Focuses on causal factors driving emotional responses
Proposes Emotion Interpretation beyond traditional emotion recognition
Introduces EIBench for rationale-based emotion analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion Interpretation focusing on causal factors
Coarse-to-Fine Self-Ask annotation pipeline
Multimodal Large Language Models for EI
๐Ÿ”Ž Similar Papers
No similar papers found.