EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work formally defines and systematically evaluates *affective hallucination* in multimodal large language models (MLLMs)—the phenomenon wherein models generate content inconsistent with or unrelated to the input’s emotional semantics. To address this, we introduce the first dedicated benchmark, a dual-dimensional evaluation framework grounded in affective psychology theory and multimodal perceptual consistency analysis, and an adversarial binary question-answering protocol. We further open-source PEP-MEK, a joint optimization framework for detection and mitigation. Extensive experiments across 38 state-of-the-art MLLMs reveal that affective hallucination is pervasive; closed-source and strong-reasoning models exhibit superior robustness; and PEP-MEK achieves an average 9.90% improvement in detection accuracy. This study establishes a novel benchmark and methodological foundation for trustworthy affective understanding in MLLMs.

Technology Category

Application Category

📝 Abstract

Emotion understanding is a critical yet challenging task. Recent advances in Multimodal Large Language Models (MLLMs) have significantly enhanced their capabilities in this area. However, MLLMs often suffer from hallucinations, generating irrelevant or nonsensical content. To the best of our knowledge, despite the importance of this issue, there has been no dedicated effort to evaluate emotion-related hallucinations in MLLMs. In this work, we introduce EmotionHallucer, the first benchmark for detecting and analyzing emotion hallucinations in MLLMs. Unlike humans, whose emotion understanding stems from the interplay of biology and social learning, MLLMs rely solely on data-driven learning and lack innate emotional instincts. Fortunately, emotion psychology provides a solid foundation of knowledge about human emotions. Building on this, we assess emotion hallucinations from two dimensions: emotion psychology knowledge and real-world multimodal perception. To support robust evaluation, we utilize an adversarial binary question-answer (QA) framework, which employs carefully crafted basic and hallucinated pairs to assess the emotion hallucination tendencies of MLLMs. By evaluating 38 LLMs and MLLMs on EmotionHallucer, we reveal that: i) most current models exhibit substantial issues with emotion hallucinations; ii) closed-source models outperform open-source ones in detecting emotion hallucinations, and reasoning capability provides additional advantages; iii) existing models perform better in emotion psychology knowledge than in multimodal emotion perception. As a byproduct, these findings inspire us to propose the PEP-MEK framework, which yields an average improvement of 9.90% in emotion hallucination detection across selected models. Resources will be available at https://github.com/xxtars/EmotionHallucer.

Problem

Research questions and friction points this paper is trying to address.

Evaluating emotion hallucinations in Multimodal Large Language Models

Detecting emotion-related hallucinations using psychology and perception

Assessing MLLM performance in emotion understanding and hallucination tendencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces EmotionHallucer benchmark for MLLMs

Uses adversarial binary QA framework

Proposes PEP-MEK framework for improvement

🔎 Similar Papers

Hallucination of Multimodal Large Language Models: A Survey