Concept-Based Explainable Artificial Intelligence: Metrics and Benchmarks

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

This work addresses the lack of a trustworthy evaluation foundation for concept-level interpretable AI models—particularly Concept Bottleneck Models (CBMs). We propose the first standardized evaluation framework, systematically introducing three quantitative metrics: concept importance, existence, and localization accuracy, alongside a Concept Activation Mapping (CAM) visualization technique. Empirical analysis reveals that mainstream post-hoc CBMs frequently suffer from concept misidentification (i.e., falsely deeming nonexistent concepts as critical), heatmap over-activation, and severe spatial misalignment. Further investigation identifies inherent inter-concept correlations as the root cause of such biases. Crucially, this study provides the first empirical evidence of widespread concept absence and substantial localization errors for post-hoc concepts in images—challenging prevailing assumptions about concept fidelity in explainable AI. Our framework establishes both theoretical grounding and practical tools for developing trustworthy concept-based models.

Technology Category

Application Category

📝 Abstract

Concept-based explanation methods, such as concept bottleneck models (CBMs), aim to improve the interpretability of machine learning models by linking their decisions to human-understandable concepts, under the critical assumption that such concepts can be accurately attributed to the network's feature space. However, this foundational assumption has not been rigorously validated, mainly because the field lacks standardised metrics and benchmarks to assess the existence and spatial alignment of such concepts. To address this, we propose three metrics: the concept global importance metric, the concept existence metric, and the concept location metric, including a technique for visualising concept activations, i.e., concept activation mapping. We benchmark post-hoc CBMs to illustrate their capabilities and challenges. Through qualitative and quantitative experiments, we demonstrate that, in many cases, even the most important concepts determined by post-hoc CBMs are not present in input images; moreover, when they are present, their saliency maps fail to align with the expected regions by either activating across an entire object or misidentifying relevant concept-specific regions. We analyse the root causes of these limitations, such as the natural correlation of concepts. Our findings underscore the need for more careful application of concept-based explanation techniques especially in settings where spatial interpretability is critical.

Problem

Research questions and friction points this paper is trying to address.

Explainable AI

Concept Bottleneck Models

Human-understandable Concepts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-based Explanation Methods

Evaluation Criteria

Visualization Tool

🔎 Similar Papers

To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems