๐ค AI Summary
This work addresses the susceptibility of multimodal large language models to textual misinformation when confronted with conflicting visual and textual inputs, which often leads to hallucinations and the neglect of visual evidence. Through path patching and head-level causal analysis across five open-source models, the study revealsโfor the first timeโa functional antagonism between attention heads that drive versus resist hallucinations, accompanied by imbalanced routing weights. Building on this insight, the authors propose MACI (Modality-Conflict-Aware Conditional Intervention), a method that dynamically suppresses hallucination-inducing attention heads upon detecting cross-modal conflict. Experiments demonstrate that MACI significantly reduces hallucinations on the MMMC benchmark while preserving high accuracy and exhibits strong zero-shot transferability to the SCI-SemanticConflict test set, outperforming existing approaches.
๐ Abstract
Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads. We find a consistent asymmetry: driving effects are more broadly distributed and carry greater aggregate weight, whereas resisting effects concentrate in a small number of high-importance heads. Ablation experiments further confirm that these groups exert opposing effects during generation: distributed driving influence and localized resistance together form an imbalanced routing structure that biases generation toward the erroneous premise. Motivated by this finding, we propose MACI (Modality-conflict-Aware Causal Intervention), a conditional intervention that suppresses causally identified hallucination-driving heads only when conflict is detected. Across five MLLMs, MACI achieves the largest hallucination reduction among compared inference-time baselines on the MMMC benchmark with a favorable hallucination-accuracy trade-off, and transfers zero-shot to the SCI-SemanticConflict test.