LINE: LLM-based Iterative Neuron Explanations for Vision Models

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing methods for neuron interpretation are constrained by predefined vocabularies or produce overly specific descriptions, limiting their ability to capture high-level, global concepts encoded in visual models. This work proposes a training-free, black-box iterative framework that leverages closed-loop interactions between large language models and text-to-image generators to dynamically generate and refine open-vocabulary concept labels based on neuron activation histories. The approach enables, for the first time, automatic discovery of neuron-associated concepts in an open-vocabulary setting, overcoming the limitations of fixed lexicons while supporting polysemy analysis and visual interpretability. Evaluated on ImageNet and Places365, the method achieves AUC improvements of 0.18 and 0.05, respectively, and uncovers an average of 29% novel concepts missed by existing vocabularies, matching the explanatory quality of gradient-based techniques.

Technology Category

Application Category

📝 Abstract

Interpreting the concepts encoded by individual neurons in deep neural networks is a crucial step towards understanding their complex decision-making processes and ensuring AI safety. Despite recent progress in neuron labeling, existing methods often limit the search space to predefined concept vocabularies or produce overly specific descriptions that fail to capture higher-order, global concepts. We introduce LINE, a novel, training-free iterative approach tailored for open-vocabulary concept labeling in vision models. Operating in a strictly black-box setting, LINE leverages a large language model and a text-to-image generator to iteratively propose and refine concepts in a closed loop, guided by activation history. We demonstrate that LINE achieves state-of-the-art performance across multiple model architectures, yielding AUC improvements of up to 0.18 on ImageNet and 0.05 on Places365, while discovering, on average, 29% of new concepts missed by massive predefined vocabularies. Beyond identifying the top concept, LINE provides a complete generation history, which enables polysemanticity evaluation and produces supporting visual explanations that rival gradient-dependent activation maximization methods.

Problem

Research questions and friction points this paper is trying to address.

neuron interpretation

open-vocabulary concept labeling

vision models

polysemanticity

AI interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary neuron interpretation

black-box interpretability

iterative concept refinement