Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

📅 2024-11-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing CNN interpretability methods suffer from a fundamental disconnect: saliency-based approaches (e.g., Grad-CAM) yield pixel-level localization but lack semantic concept grounding, while concept-based methods (e.g., TCAV) quantify model sensitivity to abstract human-understandable concepts (e.g., “striped”) yet fail to attribute to individual predictions or localize concept activations spatially. This work proposes the first unified framework jointly enabling concept-level attribution and pixel-level localization. We innovatively integrate TCAV’s Concept Activation Vectors with generalized Integrated Gradients, enabling simultaneous computation of concept attribution scores and their corresponding spatial heatmaps within a single forward-backward pass. Evaluated on ResNet and VGG architectures using human-annotated concept localization benchmarks, our method significantly outperforms both TCAV and Grad-CAM. It improves alignment between concept attributions and human cognitive judgments by 37%, effectively bridging the critical gap between local saliency maps and global semantic explanations.

Technology Category

Application Category

📝 Abstract

Convolutional Neural Networks (CNNs) have seen significant performance improvements in recent years. However, due to their size and complexity, they function as black-boxes, leading to transparency concerns. State-of-the-art saliency methods generate local explanations that highlight the area in the input image where a class is identified but cannot explain how a concept of interest contributes to the prediction, which is essential for bias mitigation. On the other hand, concept-based methods, such as TCAV (Testing with Concept Activation Vectors), provide insights into how sensitive is the network to a concept, but cannot compute its attribution in a specific prediction nor show its location within the input image. This paper introduces a novel post-hoc explainability framework, Visual-TCAV, which aims to bridge the gap between these methods by providing both local and global explanations for CNN-based image classification. Visual-TCAV uses Concept Activation Vectors (CAVs) to generate saliency maps that show where concepts are recognized by the network. Moreover, it can estimate the attribution of these concepts to the output of any class using a generalization of Integrated Gradients. This framework is evaluated on popular CNN architectures, with its validity further confirmed via experiments where ground truth for explanations is known, and a comparison with TCAV. Our code will be made available soon.

Problem

Research questions and friction points this paper is trying to address.

Bridges gap between local and global CNN explanations

Generates saliency maps showing concept recognition locations

Estimates concept attribution using Integrated Gradients generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines local and global CNN explanations

Uses CAVs to generate concept saliency maps

Generalizes Integrated Gradients for concept attribution

🔎 Similar Papers

No similar papers found.