Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

📅 2023-03-19
🏛️ IEEE Transactions on Artificial Intelligence
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of human-interpretable concepts in intermediate-layer representations of CNNs. We propose an unsupervised post-hoc method that optimizes an orthogonal rotation in feature space to extract disentangled, concept-level interpretable basis vectors from sparsely thresholded activation responses. Unlike supervised approaches relying on manually annotated concepts, ours is the first purely unsupervised paradigm for discovering highly interpretable bases. We further introduce an improved interpretability metric and a concept-alignment analysis framework, validating our method across multiple CNN architectures and datasets. Experiments demonstrate that the rotated intermediate representations significantly outperform supervised basis extraction methods in both conceptual diversity and interpretability. Our results reveal an inherent limitation of supervised paradigms—namely, their restricted coverage of conceptual breadth—and open a new direction for model interpretability research. (149 words)
📝 Abstract
An important line of research attempts to explain convolutional neural network (CNN) image classifier predictions and intermediate layer representations in terms of human understandable concepts. In this work, we expand on previous works in the literature that use annotated concept datasets to extract interpretable feature space directions and propose an unsupervised post-hoc method to extract a disentangling interpretable basis by looking for the rotation of the feature space that explains sparse one-hot thresholded transformed representations of pixel activations. We do experimentation with existing popular CNNs and demonstrate the effectiveness of our method in extracting an interpretable basis across network architectures and training datasets. We make extensions to the existing basis interpretability metrics found in the literature and show that intermediate layer representations become more interpretable when transformed to the bases extracted with our method. Finally, using the basis interpretability metrics, we compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
Problem

Research questions and friction points this paper is trying to address.

Extracting interpretable basis directions from CNN feature spaces without concept supervision
Identifying interpretable feature directions collectively using sparsity optimization instead of annotations
Enhancing interpretability of intermediate layer representations through unsupervised basis transformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised bottom-up approach identifies interpretable directions
Optimizes for sparsity property to learn interpretable basis
Extracts interpretable basis without relying on concept labels
🔎 Similar Papers
No similar papers found.
Alexandros Doumanoglou
Alexandros Doumanoglou
Visual Computing Lab, Information Technologies Institute, Centre For Research and Technology HELLAS
Computer VisionMachine LearningComputer GraphicsExplainable AI
S
S. Asteriadis
Department of Advanced Computing Sciences, University of Maastricht, Maastricht, Netherlands
D
D. Zarpalas
Information Technologies Institute (ITI), Centre for Research and Technology HELLAS (CERTH), Thessaloniki, Greece