How Can One Choose the Best CAM-Based Explainability Method for a CNN Model?

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This study addresses the lack of effective evaluation criteria for Class Activation Mapping (CAM)-based interpretability methods that align with human perception. To bridge this gap, the authors propose a systematic assessment framework grounded in human perceptual judgments, integrating crowdsourced human preference rankings with saliency map–bounding box alignment metrics. They evaluate multiple CAM variants—including LayerCAM, Score-CAM, and IS-CAM—using Rank-Biased Overlap (RBO) to quantify the consistency between different distance measures and human assessments. Their analysis reveals that the Manhattan distance and Pearson correlation coefficient best capture human preferences, leading to the identification of LayerCAM, Score-CAM, and IS-CAM as the most human-aligned interpretability methods among those tested.

📝 Abstract

In recent years, several advances have been observed in Deep Learning with surprising results. Models in this area have been increasingly used in numerous applications, including those sensitive to human life, which require clear explanations and justifications. Various explainability methods have been proposed, but not many metrics to evaluate these methods. The most commonly used metric is the Intersection over Union (IoU). However, due to the characteristics of the results of the explainability methods, called saliency maps, which do not have a known shape, we hypothesise that there must be a better metric that allows one to find an explainability method that produces results that best resemble the human perception. We propose using different metrics to assess the similarity between human perception and the explanation saliency maps to find a better metric. An investigation was conducted employing a subset of the Chihuahuas images from ImageNet dataset. Several CAM-based explainability methods were used to generate saliency maps for each chihuahua image. Alignment was measured by applying distance metrics between the bounding box of human annotations and the saliency maps produced by each explainability method. Rankings of the best saliency maps were created using the results of the distance metrics and compared to the ranking obtained using people's choice, collected through crowdsourcing, of the best explanation saliency maps for each selected image. Comparison between rankings was performed using the Rank-Biased Overlap (RBO) metric. The results indicate the feasibility of our method to find the explainability method that best resembles human perception. In our experiments, the two metrics that best resemble human perception corresponded to Manhattan and Correlation. Besides, the best explainability methods regarding human perception were LayerCAM, Score-CAM, and IS-CAM.

Problem

Research questions and friction points this paper is trying to address.

Explainability

Saliency Maps

Human Perception

CAM-based Methods

Evaluation Metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

saliency map evaluation

human perception alignment

CAM-based explainability