🤖 AI Summary
This work investigates the mechanistic disentanglement of multimodal conflicts in vision-language models (VLMs), specifically aiming to isolate and characterize conflict detection from conflict resolution. We introduce a mechanistic attribution framework grounded in linear probing—used to verify the decodability of conflict signals—and grouped attention pattern analysis, applied to LLaVA-OV-7B. Our key empirical finding is the first demonstration that conflict detection signals are linearly separable in intermediate layers; moreover, detection and resolution exhibit distinct, layer-wise attention patterns—detection dominates earlier layers, while resolution concentrates in later ones—confirming functional separation along the computational pathway. These results reveal a staged processing mechanism for multimodal conflict handling in VLMs, significantly enhancing model interpretability and enabling targeted, intervention-aware control. The findings establish a novel paradigm for conflict-aware VLM architecture design and debugging.
📝 Abstract
This paper highlights the challenge of decomposing conflict detection from conflict resolution in Vision-Language Models (VLMs) and presents potential approaches, including using a supervised metric via linear probes and group-based attention pattern analysis. We conduct a mechanistic investigation of LLaVA-OV-7B, a state-of-the-art VLM that exhibits diverse resolution behaviors when faced with conflicting multimodal inputs. Our results show that a linearly decodable conflict signal emerges in the model's intermediate layers and that attention patterns associated with conflict detection and resolution diverge at different stages of the network. These findings support the hypothesis that detection and resolution are functionally distinct mechanisms. We discuss how such decomposition enables more actionable interpretability and targeted interventions for improving model robustness in challenging multimodal settings.