๐ค AI Summary
Current multimodal sentiment understanding methods suffer from reasoning drift: models over-rely on self-generated textual explanations rather than original multimodal evidence, and their interpretability paths exhibit visual modality bias, undermining sentiment judgment reliability. To address this, we propose the Modality Importance-Guided Reasoning (MIGR) frameworkโthe first to dynamically reconstruct reasoning sequences starting from the emotion-dominant modality. MIGR employs a two-stage training strategy: modality-aligned supervised fine-tuning followed by modality-aware reward optimization, guiding multimodal large language models to attend to critical evidential cues. This effectively suppresses reasoning drift while ensuring emotional consistency and causal relevance in explanations. On the DFEW benchmark, MIGR reduces the proportion of correct sentiment predictions with inconsistent explanations from 18.10% to 7.37%, significantly enhancing reasoning reliability.
๐ Abstract
In this paper, we present Modality-Importance-Guided Reasoning (MIGR), a framework designed to improve the reliability of reasoning-based multimodal emotion understanding in multimodal large language models. Although existing methods have advanced emotion understanding, they often suffer from reasoning drift: models gradually rely on their own generated text instead of multimodal evidence, and their explanations are overly shaped by visually initiated reasoning paths. To address these issues, we introduce Modality Importance (MI), a simple yet effective mechanism for identifying the emotion-dominant modality. Using MI, MIGR reorganizes reasoning sequences so that explanations begin from the modality most critical to the target emotion, preventing early reasoning from being misled by less informative cues. Our two-stage framework-comprising modality-aligned supervised fine-tuning and modality-aware reward optimization-encourages models to generate emotionally grounded, causally relevant, and coherence-preserving explanations. Experimental results on the DFEW benchmark show that MIGR substantially improves reasoning reliability, decreasing instances of correct predictions accompanied by emotionally inconsistent explanations from 18.10% to 7.37%. These results confirm the benefit of initiating reasoning from the emotion-dominant modality.