Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Multimodal Emotion Recognition in Conversations (MERC) aims to enhance fine-grained and natural emotion understanding in human–computer interaction by fusing textual, acoustic, and visual modalities. However, existing unimodal approaches struggle with cross-modal asynchrony and context dependency inherent in dynamic conversational settings. This paper presents the first systematic survey of MERC research: it clarifies task formulations, evaluation benchmarks, and historical evolution; establishes a structured taxonomy encompassing feature- and decision-level fusion, attention mechanisms, graph neural networks, cross-modal contrastive learning, and multi-task learning; identifies core challenges—including cross-modal alignment and dynamic contextual modeling; and analyzes current performance bottlenecks and benchmark gaps. The survey provides a rigorous theoretical framework and actionable technical roadmap for advancing emotion-aware dialogue systems.

Technology Category

Application Category

📝 Abstract

While text-based emotion recognition methods have achieved notable success, real-world dialogue systems often demand a more nuanced emotional understanding than any single modality can offer. Multimodal Emotion Recognition in Conversations (MERC) has thus emerged as a crucial direction for enhancing the naturalness and emotional understanding of human-computer interaction. Its goal is to accurately recognize emotions by integrating information from various modalities such as text, speech, and visual signals. This survey offers a systematic overview of MERC, including its motivations, core tasks, representative methods, and evaluation strategies. We further examine recent trends, highlight key challenges, and outline future directions. As interest in emotionally intelligent systems grows, this survey provides timely guidance for advancing MERC research.

Problem

Research questions and friction points this paper is trying to address.

Enhancing emotional understanding in human-computer interaction through multimodal integration

Surveying methods and challenges in Multimodal Emotion Recognition in Conversations

Providing guidance for future research in emotionally intelligent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates text, speech, visual signals

Systematic overview of MERC methods

Highlights trends, challenges, future directions

🔎 Similar Papers

No similar papers found.