When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs

📅 2025-11-04

📈 Citations: 1

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study investigates the “modality following” mechanism in multimodal large language models (MLLMs) when image and text modalities conflict. Addressing the lack of interpretable decision modeling in prior work, we propose the first dual-factor disentanglement framework: decomposing modality selection into relative reasoning uncertainty (measured via entropy) and intrinsic modality preference (quantified via deconfounded metrics). Using a controllably constructed dataset and layer-wise prediction probes, we uncover their dynamic trade-off—modality-following probability monotonically decreases with increasing relative uncertainty—and identify cross-layer oscillation, explaining decision instability. Our framework enables fine-grained, interpretable, and quantitative modeling of MLLM modality conflict resolution for the first time, establishing a theoretical foundation and analytical toolkit for trustworthy multimodal reasoning.

Technology Category

Application Category

📝 Abstract

Multimodal large language models (MLLMs) must resolve conflicts when different modalities provide contradictory information, a process we term modality following. Prior work measured this behavior only with coarse dataset-level statistics, overlooking the influence of model's confidence in unimodal reasoning. In this paper, we introduce a new framework that decomposes modality following into two fundamental factors: relative reasoning uncertainty (the case-specific confidence gap between unimodal predictions) and inherent modality preference( a model's stable bias when uncertainties are balanced). To validate this framework, we construct a controllable dataset that systematically varies the reasoning difficulty of visual and textual inputs. Using entropy as a fine-grained uncertainty metric, we uncover a universal law: the probability of following a modality decreases monotonically as its relative uncertainty increases. At the relative difficulty level where the model tends to follow both modalities with comparable probability what we call the balance point, a practical indicator of the model's inherent preference. Unlike traditional macro-level ratios, this measure offers a more principled and less confounded way to characterize modality bias, disentangling it from unimodal capabilities and dataset artifacts. Further, by probing layer-wise predictions, we reveal the internal mechanism of oscillation: in ambiguous regions near the balance point, models vacillate between modalities across layers, explaining externally observed indecision. Together, these findings establish relative uncertainty and inherent preference as the two governing principles of modality following, offering both a quantitative framework and mechanistic insight into how MLLMs resolve conflicting information.

Problem

Research questions and friction points this paper is trying to address.

Measuring how MLLMs resolve conflicts between contradictory multimodal information sources

Quantifying how unimodal reasoning uncertainty influences modality preference dynamics

Developing framework to separate inherent modality bias from capability differences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes modality following into uncertainty and preference factors

Uses entropy as fine-grained metric to quantify reasoning uncertainty

Reveals layer-wise oscillation mechanism in ambiguous decision regions

🔎 Similar Papers

No similar papers found.