EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) lack emotional intelligence (EI), hindering their ability to meet affective requirements in human–machine interaction; moreover, existing benchmarks are predominantly static and unimodal, failing to capture the dynamic, multimodal nature of real-world emotional expression. Method: We introduce EmoBench—the first EI evaluation benchmark tailored for MLLMs—grounded in psychological EI theory. It features a dynamic multimodal (text + image + implicit context) evaluation framework spanning 13 authentic scenarios across three dimensions: basic emotion recognition, conversational emotion understanding, and socio-cognitively complex emotion analysis. Our approach innovatively integrates psychological scale design, multimodal prompt engineering, human-annotated emotion reasoning chains, and cross-model consistency assessment. Contribution/Results: Systematic evaluation across open- and closed-source MLLMs reveals a substantial EI performance gap relative to human baselines. All data, code, and tools are publicly released.

Technology Category

Application Category

📝 Abstract
With the integration of Multimodal large language models (MLLMs) into robotic systems and various AI applications, embedding emotional intelligence (EI) capabilities into these models is essential for enabling robots to effectively address human emotional needs and interact seamlessly in real-world scenarios. Existing static, text-based, or text-image benchmarks overlook the multimodal complexities of real-world interactions and fail to capture the dynamic, multimodal nature of emotional expressions, making them inadequate for evaluating MLLMs' EI. Based on established psychological theories of EI, we build EmoBench-M, a novel benchmark designed to evaluate the EI capability of MLLMs across 13 valuation scenarios from three key dimensions: foundational emotion recognition, conversational emotion understanding, and socially complex emotion analysis. Evaluations of both open-source and closed-source MLLMs on EmoBench-M reveal a significant performance gap between them and humans, highlighting the need to further advance their EI capabilities. All benchmark resources, including code and datasets, are publicly available at https://emo-gml.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Evaluate emotional intelligence in multimodal models
Address limitations of text-based emotional benchmarks
Assess dynamic, multimodal emotional expression recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal language models for emotional intelligence
EmoBench-M for dynamic emotion evaluation
Public benchmark resources available online
🔎 Similar Papers
No similar papers found.
H
He Hu
College of Computer Science and Software Engineering, Shenzhen University; Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
Yucheng Zhou
Yucheng Zhou
University of Macau | Fudan
Machine LearningLarge Language ModelsDeep Generative ModelsMultimodal LearningAI Healthcare
Lianzhong You
Lianzhong You
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
AI、IoT、HPC
H
Hongbo Xu
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
Qianning Wang
Qianning Wang
Auckland University of Technology
Deep LearningNatural Language Processing
Zheng Lian
Zheng Lian
Associate Professor, IEEE/CCF Senior Member, Institute of Automation, Chinese Academy of Sciences
Affective ComputingSentiment AnalysisMachine Learning
F
Fei Richard Yu
College of Computer Science and Software Engineering, Shenzhen University
F
Fei Ma
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
Laizhong Cui
Laizhong Cui
Shenzhen University
NetworkingEdge ComputingIoTBig Data,Machine Learning