Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

📅 2023-03-19

🏛️ IEEE Transactions on Artificial Intelligence

📈 Citations: 5

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the lack of human-interpretable concepts in intermediate-layer representations of CNNs. We propose an unsupervised post-hoc method that optimizes an orthogonal rotation in feature space to extract disentangled, concept-level interpretable basis vectors from sparsely thresholded activation responses. Unlike supervised approaches relying on manually annotated concepts, ours is the first purely unsupervised paradigm for discovering highly interpretable bases. We further introduce an improved interpretability metric and a concept-alignment analysis framework, validating our method across multiple CNN architectures and datasets. Experiments demonstrate that the rotated intermediate representations significantly outperform supervised basis extraction methods in both conceptual diversity and interpretability. Our results reveal an inherent limitation of supervised paradigms—namely, their restricted coverage of conceptual breadth—and open a new direction for model interpretability research. (149 words)

📝 Abstract

An important line of research attempts to explain convolutional neural network (CNN) image classifier predictions and intermediate layer representations in terms of human understandable concepts. In this work, we expand on previous works in the literature that use annotated concept datasets to extract interpretable feature space directions and propose an unsupervised post-hoc method to extract a disentangling interpretable basis by looking for the rotation of the feature space that explains sparse one-hot thresholded transformed representations of pixel activations. We do experimentation with existing popular CNNs and demonstrate the effectiveness of our method in extracting an interpretable basis across network architectures and training datasets. We make extensions to the existing basis interpretability metrics found in the literature and show that intermediate layer representations become more interpretable when transformed to the bases extracted with our method. Finally, using the basis interpretability metrics, we compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.

Problem

Research questions and friction points this paper is trying to address.

Extracting interpretable basis directions from CNN feature spaces without concept supervision

Identifying interpretable feature directions collectively using sparsity optimization instead of annotations

Enhancing interpretability of intermediate layer representations through unsupervised basis transformation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised bottom-up approach identifies interpretable directions

Optimizes for sparsity property to learn interpretable basis

Extracts interpretable basis without relying on concept labels

🔎 Similar Papers

Restyling Unsupervised Concept Based Interpretable Networks with Generative Models