Structural Graph Probing of Vision-Language Models

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This study investigates the computational organization of neuronal populations in vision-language models and their relationship to multimodal behavior. By constructing intra-layer co-activation correlation graphs and integrating graph topological analysis, targeted perturbations, and behavioral association modeling, the work proposes neural topology as an intermediate scale for interpretability. The findings reveal that cross-modal neural architectures evolve with network depth and converge onto a small set of recurrent hub neurons. These hubs encode behaviorally relevant signals that can be recovered from their activations; targeted perturbations of these neurons substantially alter model outputs, demonstrating their causal influence on multimodal behavior.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) achieve strong multimodal performance, yet how computation is organized across populations of neurons remains poorly understood. In this work, we study VLMs through the lens of neural topology, representing each layer as a within-layer correlation graph derived from neuron-neuron co-activations. This view allows us to ask whether population-level structure is behaviorally meaningful, how it changes across modalities and depth, and whether it identifies causally influential internal components under intervention. We show that correlation topology carries recoverable behavioral signal; moreover, cross-modal structure progressively consolidates with depth around a compact set of recurrent hub neurons, whose targeted perturbation substantially alters model output. Neural topology thus emerges as a meaningful intermediate scale for VLM interpretability: richer than local attribution, more tractable than full circuit recovery, and empirically tied to multimodal behavior. Code is publicly available at https://github.com/he-h/vlm-graph-probing.

Problem

Research questions and friction points this paper is trying to address.

vision-language models

neural topology

neuron co-activation

multimodal behavior

interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

neural topology

vision-language models

correlation graph

hub neurons

model interpretability

🔎 Similar Papers

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

2024-02-09European Conference on Computer VisionCitations: 29