🤖 AI Summary
This work addresses the limitations of existing emotion understanding approaches, which predominantly rely on short texts and predefined categorical labels while neglecting the structured dependencies among emotion dimensions, thereby hindering context-aware multidimensional reasoning. To bridge this gap, the authors construct EmoScene—a benchmark comprising 4,731 context-rich scenarios annotated with eight-dimensional emotion vectors grounded in Plutchik’s emotion theory—and introduce the novel concept of “emotion entanglement.” They propose an entanglement-aware Bayesian inference framework that integrates emotion co-occurrence statistics to jointly model structural dependencies across multiple emotion dimensions. Experimental results demonstrate that this approach significantly enhances consistency and accuracy in multilabel emotion prediction: large language models achieve a zero-shot Macro F1 of 0.501 on EmoScene, and Bayesian post-processing improves weaker models such as Qwen2.5-7B by 0.051 in Macro F1.
📝 Abstract
Understanding emotions in natural language is inherently a multi-dimensional reasoning problem, where multiple affective signals interact through context, interpersonal relations, and situational cues. However, most existing emotion understanding benchmarks rely on short texts and predefined emotion labels, reducing this process to independent label prediction and ignoring the structured dependencies among emotions. To address this limitation, we introduce Emotional Scenarios (EmoScene), a theory-grounded benchmark of 4,731 context-rich scenarios annotated with an 8-dimensional emotion vector derived from Plutchik's basic emotions. We evaluate six instruction-tuned large language models in a zero-shot setting and observe modest performance, with the best model achieving a Macro F1 of 0.501, highlighting the difficulty of context-aware multi-label emotion prediction. Motivated by the observation that emotions rarely occur independently, we further propose an entanglement-aware Bayesian inference framework that incorporates emotion co-occurrence statistics to perform joint posterior inference over the emotion vector. This lightweight post-processing improves structural consistency of predictions and yields notable gains for weaker models (e.g., +0.051 Macro F1 for Qwen2.5-7B). EmoScene therefore provides a challenging benchmark for studying multi-dimensional emotion understanding and the limitations of current language models.