Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-vocabulary multimodal sentiment recognition often struggles with ambiguous modality cues and dynamic contextual variations, leading to unreliable affective judgments. To address this challenge, this work proposes HyDRA, an architecture that formalizes hybrid evidential deductive reasoning as a learnable inference trajectory. HyDRA simulates multi-perspective evidence fusion through a Propose-Verify-Decide protocol and leverages reinforcement learning with a hierarchical reward mechanism to guide fine-grained affective state reconstruction. The approach not only enhances model robustness in ambiguous or conflicting scenarios but also yields interpretable and traceable diagnostic evidence paths. Experimental results demonstrate that HyDRA significantly outperforms strong baselines across multiple benchmarks, with particularly notable gains in challenging contextual settings.

Technology Category

Application Category

📝 Abstract
Open-Vocabulary Multimodal Emotion Recognition (OV-MER) is inherently challenging due to the ambiguity of equivocal multimodal cues, which often stem from distinct unobserved situational dynamics. While Multimodal Large Language Models (MLLMs) offer extensive semantic coverage, their performance is often bottlenecked by premature commitment to dominant data priors, resulting in suboptimal heuristics that overlook crucial, complementary affective cues across modalities. We argue that effective affective reasoning requires more than surface-level association; it necessitates reconstructing nuanced emotional states by synthesizing multiple evidence-grounded rationales that reconcile these observations from diverse latent perspectives. We introduce HyDRA, a Hybrid-evidential Deductive Reasoning Architecture that formalizes inference as a Propose-Verify-Decide protocol. To internalize this abductive process, we employ reinforcement learning with hierarchical reward shaping, aligning the reasoning trajectories with final task performance to ensure they best reconcile the observed multimodal cues. Systematic evaluations validate our design choices, with HyDRA consistently outperforming strong baselines--especially in ambiguous or conflicting scenarios--while providing interpretable, diagnostic evidence traces.
Problem

Research questions and friction points this paper is trying to address.

Open-Vocabulary Multimodal Emotion Recognition
Multimodal Ambiguity
Emotional State Reconstruction
Affective Reasoning
Situational Dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-evidential Reasoning
Open-Vocabulary Multimodal Emotion Recognition
Deductive Inference
Reinforcement Learning with Hierarchical Rewards
Multimodal Large Language Models
🔎 Similar Papers
No similar papers found.
Y
Yu Liu
Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
Lei Zhang
Lei Zhang
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Agentic CodingReinforcement LearningLarge Language Model
H
Haoxun Li
Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
H
Hanlei Shi
Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
Yuxuan Ding
Yuxuan Ding
Qualcomm AI Research
Vision-and-LanguageLarge Language ModelEfficient AI
Leyuan Qu
Leyuan Qu
Hangzhou Institute for Advanced Study, UCAS
Speech Representation LearningMulti-modal Learning and Affective Computing
T
Taihao Li
Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China