🤖 AI Summary
Multimodal emotion recognition in conversational settings is often hindered by modality-specific noise and limited contextual reasoning capabilities. To address these challenges, this work proposes SURE, a novel framework that uniquely integrates uncertainty modeling with iterative contextual reasoning. SURE employs an uncertainty-aware mixture-of-experts module to suppress noisy modalities, enhances multi-turn dialogue modeling through an iterative inference mechanism, and introduces a Transformer-based gating module to effectively capture both intra- and inter-modality interactions. Experimental results demonstrate that SURE significantly outperforms state-of-the-art methods across multiple MERC benchmarks, achieving notable improvements in model robustness and fine-grained emotional reasoning.
📝 Abstract
Multimodal emotion recognition in conversations (MERC) requires integrating multimodal signals while being robust to noise and modeling contextual reasoning. Existing approaches often emphasize fusion but overlook uncertainty in noisy features and fine-grained reasoning. We propose SURE (Synergistic Uncertainty-aware REasoning) for MERC, a framework that improves robustness and contextual modeling. SURE consists of three components: an Uncertainty-Aware Mixture-of-Experts module to handle modality-specific noise, an Iterative Reasoning module for multi-turn reasoning over context, and a Transformer Gate module to capture intra- and inter-modal interactions. Experiments on benchmark MERC datasets show that SURE consistently outperforms state-of-the-art methods, demonstrating its effectiveness in robust multimodal reasoning. These results highlight the importance of uncertainty modeling and iterative reasoning in advancing emotion recognition in conversational settings.