🤖 AI Summary
This study addresses the persistent issue of hallucination and misidentification in vision-language models when processing multi-object scenes, a phenomenon often attributed to representational confusion whose underlying mechanisms remain poorly understood. By analyzing the geometric structure of internal representations in open-source models such as Qwen, InternVL, and Gemma, we identify “concept vectors” that encode visual concepts and, for the first time, establish a quantitative relationship between the geometric overlap of these vectors and specific error patterns. Through a combination of representational analysis and targeted intervention experiments, our work not only uncovers an interpretable mechanism behind model failures but also enables reliable manipulation of model perceptual behavior, offering a novel framework for understanding and improving the internal representations of vision-language models.
📝 Abstract
Vision-Language Models (VLMs) exhibit puzzling failures in multi-object visual tasks, such as hallucinating non-existent elements or failing to identify the most similar objects among distractions. While these errors mirror human cognitive constraints, such as the"Binding Problem", the internal mechanisms driving them in artificial systems remain poorly understood. Here, we propose a mechanistic insight by analyzing the representational geometry of open-weight VLMs (Qwen, InternVL, Gemma), comparing methodologies to distill"concept vectors"- latent directions encoding visual concepts. We validate our concept vectors via steering interventions that reliably manipulate model behavior in both simplified and naturalistic vision tasks (e.g., forcing the model to perceive a red flower as blue). We observe that the geometric overlap between these vectors strongly correlates with specific error patterns, offering a grounded quantitative framework to understand how internal representations shape model behavior and drive visual failures.