From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing analyses of neurons in vision-language models are largely confined to single-task settings, overlooking the influence of task-specific attention heads on feedforward neuron writing. This limitation exacerbates neuronal polysemy in multitask scenarios and hinders accurate identification and effective intervention on task-critical neurons. To address this, this work proposes HONES, a framework that jointly models the effects of attention heads and neuron writing for the first time. HONES ranks neurons based on their causal writing contributions under specific tasks and employs lightweight scaling to enable gradient-free, task-aware neuron attribution and control. Experiments across four multimodal tasks and two mainstream architectures demonstrate that HONES more accurately identifies task-critical neurons and significantly improves intervention efficacy.

Technology Category

Application Category

📝 Abstract

Recent work has increasingly explored neuron-level interpretation in vision-language models (VLMs) to identify neurons critical to final predictions. However, existing neuron analyses generally focus on single tasks, limiting the comparability of neuron importance across tasks. Moreover, ranking strategies tend to score neurons in isolation, overlooking how task-dependent information pathways shape the write-in effects of feed-forward network (FFN) neurons. This oversight can exacerbate neuron polysemanticity in multi-task settings, introducing noise into the identification and intervention of task-critical neurons. In this study, we propose HONES (Head-Oriented Neuron Explanation & Steering), a gradient-free framework for task-aware neuron attribution and steering in multi-task VLMs. HONES ranks FFN neurons by their causal write-in contributions conditioned on task-relevant attention heads, and further modulates salient neurons via lightweight scaling. Experiments on four diverse multimodal tasks and two popular VLMs show that HONES outperforms existing methods in identifying task-critical neurons and improves model performance after steering. Our source code is released at: https://github.com/petergit1/HONES.

Problem

Research questions and friction points this paper is trying to address.

neuron attribution

multi-task vision-language models

neuron polysemanticity

task-aware interpretation

causal write-in effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal attribution

neuron steering

multi-task vision-language models

attention-head conditioning

gradient-free interpretation

🔎 Similar Papers

What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Noise-free Text-Image Corruption and Evaluation

2024-06-24arXiv.orgCitations: 0

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

2024-02-09European Conference on Computer VisionCitations: 29