OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition

📅 2024-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing MER methods are constrained by predefined emotion categories, limiting their ability to capture the complexity and fine-grained distinctions inherent in human affect. This paper introduces Open-Vocabulary Multimodal Emotion Recognition (OV-MER), a novel paradigm that pioneers zero-shot open-vocabulary learning for MER—enabling semantic-driven recognition of unseen, compositional, and psychologically grounded non-basic emotions (e.g., “bittersweet”, “awe”). Our contributions include: (1) the first open-vocabulary MER benchmark dataset; (2) a new evaluation metric based on semantic similarity; and (3) a zero-shot generalization architecture integrating cross-modal alignment, CLIP-style contrastive learning, and semantic embedding space mapping. Experiments demonstrate substantial improvements in fine-grained emotion classification accuracy and semantic plausibility, while confirming strong cross-category generalization capability.

Technology Category

Application Category

📝 Abstract
Multimodal Emotion Recognition (MER) is a critical research area that seeks to decode human emotions from diverse data modalities. However, existing machine learning methods predominantly rely on predefined emotion taxonomies, which fail to capture the inherent complexity, subtlety, and multi-appraisal nature of human emotional experiences, as demonstrated by studies in psychology and cognitive science. To overcome this limitation, we advocate for introducing the concept of open vocabulary into MER. This paradigm shift aims to enable models to predict emotions beyond a fixed label space, accommodating a flexible set of categories to better reflect the nuanced spectrum of human emotions. To achieve this, we propose a novel paradigm: Open-Vocabulary MER (OV-MER), which enables emotion prediction without being confined to predefined spaces. However, constructing a dataset that encompasses the full range of emotions for OV-MER is practically infeasible; hence, we present a comprehensive solution including a newly curated database, novel evaluation metrics, and a preliminary benchmark. By advancing MER from basic emotions to more nuanced and diverse emotional states, we hope this work can inspire the next generation of MER, enhancing its generalizability and applicability in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Emotion Recognition
Complexity of Human Emotions
Practical Limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-Vocabulary MER
Multi-Modal Emotion Recognition
Comprehensive Solution Framework
🔎 Similar Papers
No similar papers found.
Zheng Lian
Zheng Lian
Associate Professor, IEEE/CCF Senior Member, Institute of Automation, Chinese Academy of Sciences
Affective ComputingSentiment AnalysisMachine Learning
H
Haiyang Sun
Shanghai Jiao Tong University
Licai Sun
Licai Sun
University of Oulu
Affective computingDeep learningMachine learning
Lan Chen
Lan Chen
Communication University of China
Image/Video generation and editing
H
Haoyu Chen
University of Oulu
Hao Gu
Hao Gu
Sun Yat-Sen University
Planetary aeronomyAtmospheric escapeSpace physics
Z
Zhuofan Wen
Institute of Automation, Chinese Academy of Sciences
Shun Chen
Shun Chen
中国科学院自动化研究所
情感计算、人机交互、深度学习
S
Siyuan Zhang
Institute of Automation, Chinese Academy of Sciences
H
Hailiang Yao
Institute of Automation, Chinese Academy of Sciences
Mingyu Xu
Mingyu Xu
Bytedance
large language modelmachine learning
K
Kang Chen
Peking University
B
Bin Liu
Institute of Automation, Chinese Academy of Sciences
R
Rui Liu
Inner Mongolia University
S
Shan Liang
Department of Intelligent Science, Xi’an Jiaotong-Liverpool University
Y
Ya Li
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Jiangyan Yi
Jiangyan Yi
Tsinghua University
speech signal processingspeech synthesisfake audio detectioncontinual learning
J
Jianhua Tao
Department of Automation, Tsinghua University; Beijing National Research Center for Information Science and Technology, Tsinghua University