🤖 AI Summary
Existing neuron-level interpretation methods suffer from fundamental limitations in completeness and human interpretability, as they ignore the high-dimensional, distributed nature of neural representations.
Method: Using an ImageNet-pretrained AlexNet, we systematically demonstrate that individual neurons are suboptimal explanatory bases and instead propose principal components (PCs)—statistically dominant, semantically clearer, low-dimensional orthogonal bases—as superior alternatives. We validate this via PCA, activation variance quantification, and user studies.
Results: The top-k high-variance PCs explain significantly more activation variance than the top-k neurons; moreover, PCs exhibit substantially higher functional impact and human comprehensibility. This work provides the first empirical evidence of PCs’ dual advantage in both completeness and interpretability, advocating a paradigm shift from neuron-centric to statistically grounded, low-dimensional structural explanations for deep neural networks.
📝 Abstract
High quality explanations of neural networks (NNs) should exhibit two key properties. Completeness ensures that they accurately reflect a network's function and interpretability makes them understandable to humans. Many existing methods provide explanations of individual neurons within a network. In this work we provide evidence that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability compared to activation principal components. Neurons are a poor basis for AlexNet embeddings because they don't account for the distributed nature of these representations. By examining two quantitative measures of completeness and conducting a user study to measure interpretability, we show the most important principal components provide more complete and interpretable explanations than the most important neurons. Much of the activation variance may be explained by examining relatively few high-variance PCs, as opposed to studying every neuron. These principal components also strongly affect network function, and are significantly more interpretable than neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings and instead choose a basis, such as principal components, which accounts for the high dimensional and distributed nature of a network's internal representations. Interactive demo and code available at https://ndey96.github.io/neuron-explanations-sacrifice.