Select, Hypothesize and Verify: Towards Verified Neuron Concept Interpretation

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing neuron-level concept explanation methods, which often assume that all neurons possess clear functional roles, thereby overlooking redundant or misleading neurons that can distort interpretations of model decision-making. To overcome this, the authors propose the Select-Hypothesize-Verify (SHV) framework: it first selects the most representative samples based on activation distributions, then generates natural language concept hypotheses, and finally validates these hypotheses through a neuron activation verification mechanism. SHV introduces, for the first time, a systematic pipeline for concept validation, effectively identifying and focusing on neurons with genuine semantic meaning. Experimental results demonstrate that concepts produced by SHV activate target neurons at 1.5 times the rate of state-of-the-art methods, substantially improving the accuracy and reliability of model interpretations.

Technology Category

Application Category

📝 Abstract
It is essential for understanding neural network decisions to interpret the functionality (also known as concepts) of neurons. Existing approaches describe neuron concepts by generating natural language descriptions, thereby advancing the understanding of the neural network's decision-making mechanism. However, these approaches assume that each neuron has well-defined functions and provides discriminative features for neural network decision-making. In fact, some neurons may be redundant or may offer misleading concepts. Thus, the descriptions for such neurons may cause misinterpretations of the factors driving the neural network's decisions. To address the issue, we introduce a verification of neuron functions, which checks whether the generated concept highly activates the corresponding neuron. Furthermore, we propose a Select-Hypothesize-Verify framework for interpreting neuron functionality. This framework consists of: 1) selecting activation samples that best capture a neuron's well-defined functional behavior through activation-distribution analysis; 2) forming hypotheses about concepts for the selected neurons; and 3) verifying whether the generated concepts accurately reflect the functionality of the neuron. Extensive experiments show that our method produces more accurate neuron concepts. Our generated concepts activate the corresponding neurons with a probability approximately 1.5 times that of the current state-of-the-art method.
Problem

Research questions and friction points this paper is trying to address.

neuron concept interpretation
neural network interpretability
misleading neuron concepts
redundant neurons
concept verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

neuron concept interpretation
Select-Hypothesize-Verify framework
concept verification
activation-distribution analysis
interpretable AI
Z
ZeBin Ji
Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
Y
Yang Hu
Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China
Xiuli Bi
Xiuli Bi
Professor of Computer Science, Chongqing University of Posts and Telecommunications
Image ProcessingPattern Recognition
Bo Liu
Bo Liu
Associate Professor, Chongqing University of Posts and Telecommunications
Information SecurityMultimedia ForensicsImage Processing
Bin Xiao
Bin Xiao
Meta GenAI
Computer VisionVision and LanguageMachine LearningHuman Pose Estimation