Learning What to Attend First: Modality-Importance-Guided Reasoning for Reliable Multimodal Emotion Understanding

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Current multimodal sentiment understanding methods suffer from reasoning drift: models over-rely on self-generated textual explanations rather than original multimodal evidence, and their interpretability paths exhibit visual modality bias, undermining sentiment judgment reliability. To address this, we propose the Modality Importance-Guided Reasoning (MIGR) framework—the first to dynamically reconstruct reasoning sequences starting from the emotion-dominant modality. MIGR employs a two-stage training strategy: modality-aligned supervised fine-tuning followed by modality-aware reward optimization, guiding multimodal large language models to attend to critical evidential cues. This effectively suppresses reasoning drift while ensuring emotional consistency and causal relevance in explanations. On the DFEW benchmark, MIGR reduces the proportion of correct sentiment predictions with inconsistent explanations from 18.10% to 7.37%, significantly enhancing reasoning reliability.

Technology Category

Application Category

📝 Abstract

In this paper, we present Modality-Importance-Guided Reasoning (MIGR), a framework designed to improve the reliability of reasoning-based multimodal emotion understanding in multimodal large language models. Although existing methods have advanced emotion understanding, they often suffer from reasoning drift: models gradually rely on their own generated text instead of multimodal evidence, and their explanations are overly shaped by visually initiated reasoning paths. To address these issues, we introduce Modality Importance (MI), a simple yet effective mechanism for identifying the emotion-dominant modality. Using MI, MIGR reorganizes reasoning sequences so that explanations begin from the modality most critical to the target emotion, preventing early reasoning from being misled by less informative cues. Our two-stage framework-comprising modality-aligned supervised fine-tuning and modality-aware reward optimization-encourages models to generate emotionally grounded, causally relevant, and coherence-preserving explanations. Experimental results on the DFEW benchmark show that MIGR substantially improves reasoning reliability, decreasing instances of correct predictions accompanied by emotionally inconsistent explanations from 18.10% to 7.37%. These results confirm the benefit of initiating reasoning from the emotion-dominant modality.

Problem

Research questions and friction points this paper is trying to address.

Addresses reasoning drift in multimodal emotion understanding models.

Prevents early reasoning from being misled by less informative cues.

Improves reliability of emotionally grounded and coherent explanations.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-Importance-Guided Reasoning (MIGR) framework improves multimodal emotion understanding

Modality Importance (MI) mechanism identifies emotion-dominant modality to guide reasoning

Two-stage training with fine-tuning and reward optimization ensures emotionally grounded explanations

🔎 Similar Papers

No similar papers found.