Federated Dialogue-Semantic Diffusion for Emotion Recognition under Incomplete Modalities

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal Emotion Recognition in Conversation (MERC) suffers significant performance degradation under stochastic modality missing, and existing imputation methods often introduce semantic distortion—especially under extreme missing patterns such as fixed-modality absence. Method: This paper pioneers the integration of federated learning into modality recovery, proposing a Federated Dialogue Semantic Diffusion framework. It models contextual and speaker dependencies via a dialogue graph network and employs a semantic-conditioned diffusion model for decentralized, cross-client modality generation. An alternating freezing aggregation strategy is introduced to ensure stable collaborative training. Contribution/Results: The framework achieves state-of-the-art performance on IEMOCAP, CMU-MOSI, and CMU-MOSEI across diverse missing patterns. It enables high-fidelity modality reconstruction and semantically consistent multimodal fusion while preserving data privacy through decentralized learning.

Technology Category

Application Category

📝 Abstract
Multimodal Emotion Recognition in Conversations (MERC) enhances emotional understanding through the fusion of multimodal signals. However, unpredictable modality absence in real-world scenarios significantly degrades the performance of existing methods. Conventional missing-modality recovery approaches, which depend on training with complete multimodal data, often suffer from semantic distortion under extreme data distributions, such as fixed-modality absence. To address this, we propose the Federated Dialogue-guided and Semantic-Consistent Diffusion (FedDISC) framework, pioneering the integration of federated learning into missing-modality recovery. By federated aggregation of modality-specific diffusion models trained on clients and broadcasting them to clients missing corresponding modalities, FedDISC overcomes single-client reliance on modality completeness. Additionally, the DISC-Diffusion module ensures consistency in context, speaker identity, and semantics between recovered and available modalities, using a Dialogue Graph Network to capture conversational dependencies and a Semantic Conditioning Network to enforce semantic alignment. We further introduce a novel Alternating Frozen Aggregation strategy, which cyclically freezes recovery and classifier modules to facilitate collaborative optimization. Extensive experiments on the IEMOCAP, CMUMOSI, and CMUMOSEI datasets demonstrate that FedDISC achieves superior emotion classification performance across diverse missing modality patterns, outperforming existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in emotion recognition with missing modalities
Overcomes semantic distortion in conventional missing-modality recovery methods
Pioneers federated learning integration for robust multimodal emotion recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning integrates missing-modality recovery diffusion models
Dialogue and semantic networks ensure consistent modality recovery
Alternating frozen aggregation strategy enables collaborative optimization cycles
X
Xihang Qiu
Shenzhen MSU-BIT University, Beijing Institude of Technology
J
Jiarong Cheng
Shenzhen MSU-BIT University, Beijing Institude of Technology
Y
Yuhao Fang
Shenzhen MSU-BIT University
Wanpeng Zhang
Wanpeng Zhang
Ph.D. Candidate, Peking University
Machine LearningReinforcement LearningLanguage Modeling
Y
Yao Lu
Shenzhen MSU-BIT University
Y
Ye Zhang
Shenzhen MSU-BIT University, Beijing Institude of Technology
Chun Li
Chun Li
MD Anderson Cancer Center
diagnostic imagingdrug deliverynanotechnology