The Weighting Game: Evaluating Quality of Explainability Methods

📅 2022-08-12

🏛️ Scandinavian Conference on Image Analysis

📈 Citations: 5

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This paper addresses the evaluation of explanation quality in image classification heatmaps, proposing a novel assessment paradigm that jointly considers accuracy and stability. For accuracy, we introduce the Weighting Game metric—a first-of-its-kind quantitative measure evaluating how well class-relevant explanations align with ground-truth semantic segmentation masks. For stability, we design a geometric transformation–based similarity comparison method using scaling and translation to quantify heatmap robustness against input perturbations. Our framework integrates class activation mapping (CAM), segmentation mask matching, and statistical analysis, and is systematically evaluated across mainstream CAM methods and diverse model architectures. Experiments reveal that explanation quality is strongly architecture-dependent, providing empirical guidance for selecting appropriate interpretability methods. This work advances heatmap evaluation from qualitative inspection toward a reproducible, comparable, and quantitative paradigm.

📝 Abstract

The objective of this paper is to assess the quality of explanation heatmaps for image classification tasks. To assess the quality of explainability methods, we approach the task through the lens of accuracy and stability. In this work, we make the following contributions. Firstly, we introduce the Weighting Game, which measures how much of a class-guided explanation is contained within the correct class' segmentation mask. Secondly, we introduce a metric for explanation stability, using zooming/panning transformations to measure differences between saliency maps with similar contents. Quantitative experiments are produced, using these new metrics, to evaluate the quality of explanations provided by commonly used CAM methods. The quality of explanations is also contrasted between different model architectures, with findings highlighting the need to consider model architecture when choosing an explainability method.

Problem

Research questions and friction points this paper is trying to address.

Evaluating quality of explanation heatmaps for image classification

Assessing explainability methods via accuracy and stability metrics

Measuring explanation containment within correct class segmentation masks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates explanation heatmaps via accuracy metrics

Measures explanation stability using transformation techniques

Assesses CAM methods across different model architectures

🔎 Similar Papers

No similar papers found.