🤖 AI Summary
This study investigates the computational organization of neuronal populations in vision-language models and their relationship to multimodal behavior. By constructing intra-layer co-activation correlation graphs and integrating graph topological analysis, targeted perturbations, and behavioral association modeling, the work proposes neural topology as an intermediate scale for interpretability. The findings reveal that cross-modal neural architectures evolve with network depth and converge onto a small set of recurrent hub neurons. These hubs encode behaviorally relevant signals that can be recovered from their activations; targeted perturbations of these neurons substantially alter model outputs, demonstrating their causal influence on multimodal behavior.
📝 Abstract
Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activations. This view allows us to ask whether population-level structure is behaviorally meaningful, how it changes across modalities and depth, and whether it identifies causally influential internal components under intervention. We show that correlation topology carries recoverable behavioral signal; moreover, cross-modal structure progressively consolidates with depth around a compact set of recurrent hub neurons, whose targeted perturbation substantially alters model output. Neural topology thus emerges as a meaningful intermediate scale for VLM interpretability: richer than local attribution, more tractable than full circuit recovery, and empirically tied to multimodal behavior. Code is publicly available at https://github.com/he-h/vlm-graph-probing.