When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual-language models (VLMs) frequently exhibit hallucinations when internal priors conflict with external visual inputs. This work investigates the underlying decision-making mechanisms under such cross-modal tension and constructs a multimodal counterfactual dataset to explicitly elicit knowledge–vision conflicts. We propose a logit-difference-based attention-head localization method to precisely identify the layers most responsible for “visual override” behavior. Furthermore, we design an intervenable mechanism that dynamically modulates the model’s reliance on either visual input or linguistic priors. Through comparative analysis of attention attribution versus gradient attribution, we find the former significantly outperforms the latter in localizing relevant visual regions. To our knowledge, this is the first work enabling observable, localizable, and controllable characterization of VLMs’ cross-modal trade-offs—establishing a novel paradigm for mitigating hallucinations and enhancing model reliability.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) increasingly leverage diverse knowledge sources to address complex tasks, often encountering conflicts between their internal parametric knowledge and external information. Knowledge conflicts can result in hallucinations and unreliable responses, but the mechanisms governing such interactions remain unknown. To address this gap, we analyze the mechanisms that VLMs use to resolve cross-modal conflicts by introducing a dataset of multimodal counterfactual queries that deliberately contradict internal commonsense knowledge. We localize with logit inspection a small set of heads that control the conflict. Moreover, by modifying these heads, we can steer the model towards its internal knowledge or the visual inputs. Finally, we show that attention from such heads pinpoints localized image regions driving visual overrides, outperforming gradient-based attribution in precision.
Problem

Research questions and friction points this paper is trying to address.

Analyzing how VLMs resolve conflicts between internal and external knowledge
Identifying specific model heads controlling cross-modal knowledge conflicts
Improving attribution precision for visual overrides in VLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset of multimodal counterfactual queries for analysis
Localize conflict-controlling heads via logit inspection
Modify heads to steer knowledge or visual inputs
🔎 Similar Papers
No similar papers found.