Disentangling Polysemantic Channels in Convolutional Neural Networks

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Convolutional neural network (CNN) channels often exhibit polysemy—i.e., a single channel encodes multiple semantic concepts—severely undermining interpretability. To address this, we propose a weight reconstruction algorithm that achieves the first structural disentanglement of polysemous channels. Our method clusters activation patterns from preceding layers to identify heterogeneous response modes, decomposes each original channel into multiple semantically specialized sub-channels, and performs reparameterization via convolutional kernel remapping and feature response analysis. Crucially, the approach preserves the original network architecture and enables editable, mechanism-level explanations. Evaluated on an ImageNet subset, it increases per-channel semantic purity by 63% on average, substantially improving feature visualization quality, concept localization accuracy, and attribution reliability. This work establishes a novel paradigm for channel-level interpretability in CNNs.

Technology Category

Application Category

📝 Abstract

Mechanistic interpretability is concerned with analyzing individual components in a (convolutional) neural network (CNN) and how they form larger circuits representing decision mechanisms. These investigations are challenging since CNNs frequently learn polysemantic channels that encode distinct concepts, making them hard to interpret. To address this, we propose an algorithm to disentangle a specific kind of polysemantic channel into multiple channels, each responding to a single concept. Our approach restructures weights in a CNN, utilizing that different concepts within the same channel exhibit distinct activation patterns in the previous layer. By disentangling these polysemantic features, we enhance the interpretability of CNNs, ultimately improving explanatory techniques such as feature visualizations.

Problem

Research questions and friction points this paper is trying to address.

Disentangling polysemantic channels in CNNs for clarity

Separating mixed concepts in neural network channels

Enhancing CNN interpretability via feature disentanglement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm disentangles polysemantic CNN channels

Restructures weights using distinct activation patterns

Enhances interpretability via single-concept channels

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers