Dynamic Fusion-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition in Conversations

📅 2026-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal conversational emotion recognition methods typically employ fixed-parameter fusion of multimodal features, which struggles to accommodate the dynamic requirements of different emotion categories and thus limits recognition performance. To address this limitation, this work proposes a Dynamic Fusion-aware Graph Convolutional Network (DF-GCN), which introduces ordinary differential equations into the graph convolutional framework for the first time and designs a global information-guided dynamic prompting mechanism that adaptively adjusts fusion parameters according to the target emotion category. Extensive experiments on two public multimodal conversational datasets demonstrate that the proposed method significantly outperforms state-of-the-art models, validating the effectiveness of the dynamic fusion mechanism in enhancing both accuracy and generalization capability in emotion recognition.

Technology Category

Application Category

📝 Abstract
Multimodal emotion recognition in conversations (MERC) aims to identify and understand the emotions expressed by speakers during utterance interaction from multiple modalities (e.g., text, audio, images, etc.). Existing studies have shown that GCN can improve the performance of MERC by modeling dependencies between speakers. However, existing methods usually use fixed parameters to process multimodal features for different emotion types, ignoring the dynamics of fusion between different modalities, which forces the model to balance performance between multiple emotion categories, thus limiting the model's performance on some specific emotions. To this end, we propose a dynamic fusion-aware graph convolutional neural network (DF-GCN) for robust recognition of multimodal emotion features in conversations. Specifically, DF-GCN integrates ordinary differential equations into graph convolutional networks (GCNs) to {capture} the dynamic nature of emotional dependencies within utterance interaction networks and leverages the prompts generated by the global information vector (GIV) of the utterance to guide the dynamic fusion of multimodal features. This allows our model to dynamically change parameters when processing each utterance feature, so that different network parameters can be equipped for different emotion categories in the inference stage, thereby achieving more flexible emotion classification and enhancing the generalization ability of the model. Comprehensive experiments conducted on two public multimodal conversational datasets {confirm} that the proposed DF-GCN model delivers superior performance, benefiting significantly from the dynamic fusion mechanism introduced.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Emotion Recognition
Dynamic Fusion
Graph Convolutional Network
Emotion Classification
Conversational Context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Fusion
Graph Convolutional Network
Multimodal Emotion Recognition
Ordinary Differential Equations
Prompt-guided Learning
🔎 Similar Papers
No similar papers found.
Tao Meng
Tao Meng
Central South University of Forestry and Technology
Graph Neural NetworkMultimodal Emotion RecognitionText ClassificationEntity Alignment
W
Weilun Tang
College of Computer and Mathematics, Central South University of Forestry and Technology, 410004, Hunan, Changsha, China
Y
Yuntao Shou
College of Computer and Mathematics, Central South University of Forestry and Technology, 410004, Hunan, Changsha, China
Y
Yilong Tan
College of Computer and Mathematics, Central South University of Forestry and Technology, 410004, Hunan, Changsha, China
Jun Zhou
Jun Zhou
School of Information and Communication Technology, Griffith University
Spectral ImagingImage ProcessingPattern RecognitionRemote Sensing
W
Wei Ai
College of Computer and Mathematics, Central South University of Forestry and Technology, 410004, Hunan, Changsha, China
Keqin Li
Keqin Li
AMA University
RoboticMachine learningArtificial intelligenceComputer vision