Evaluating Explainability: A Framework for Systematic Assessment and Reporting of Explainable AI Features

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A systematic, quantitative evaluation framework for explainable artificial intelligence (XAI) explanation quality—particularly one balancing scientific rigor and clinical utility—is currently lacking. Method: We propose the first four-dimensional XAI assessment framework tailored for medical AI, evaluating consistency, credibility, fidelity, and practicality, supported by a standardized, open-source scoring card. Innovatively, we introduce clinical scenario-driven evaluation criteria grounded in ablation-based class activation mapping (Ablation-CAM) and Eigen-CAM heatmap methods, conducting multi-criteria quantitative evaluation and clinical relevance validation on synthetic mammography data. Contribution/Results: We validate the clinical feasibility of three core evaluation criteria in breast lesion detection, ensuring reportable, reproducible assessments. This work advances XAI evaluation from qualitative interpretation toward a quantitative, clinically verifiable paradigm.

Technology Category

Application Category

📝 Abstract
Explainability features are intended to provide insight into the internal mechanisms of an AI device, but there is a lack of evaluation techniques for assessing the quality of provided explanations. We propose a framework to assess and report explainable AI features. Our evaluation framework for AI explainability is based on four criteria: 1) Consistency quantifies the variability of explanations to similar inputs, 2) Plausibility estimates how close the explanation is to the ground truth, 3) Fidelity assesses the alignment between the explanation and the model internal mechanisms, and 4) Usefulness evaluates the impact on task performance of the explanation. Finally, we developed a scorecard for AI explainability methods that serves as a complete description and evaluation to accompany this type of algorithm. We describe these four criteria and give examples on how they can be evaluated. As a case study, we use Ablation CAM and Eigen CAM to illustrate the evaluation of explanation heatmaps on the detection of breast lesions on synthetic mammographies. The first three criteria are evaluated for clinically-relevant scenarios. Our proposed framework establishes criteria through which the quality of explanations provided by AI models can be evaluated. We intend for our framework to spark a dialogue regarding the value provided by explainability features and help improve the development and evaluation of AI-based medical devices.
Problem

Research questions and friction points this paper is trying to address.

Lack of evaluation techniques for AI explainability quality
Proposing a framework to assess explainable AI features
Establishing criteria for evaluating AI explanation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework assesses explainable AI features systematically
Four criteria: consistency, plausibility, fidelity, usefulness
Scorecard evaluates AI explainability methods comprehensively
🔎 Similar Papers
No similar papers found.
M
Miguel A. Lago
U.S. Food and Drug Administration, Silver Spring, MD 20993, USA
Ghada Zamzmi
Ghada Zamzmi
FDA/CDRH/OSEL/DIDSR
Artificial IntelligenceMachine LearningComputer VisionAffective ComputingMedical Imaging
B
Brandon Eich
U.S. Food and Drug Administration, Silver Spring, MD 20993, USA
J
Jana G. Delfino
U.S. Food and Drug Administration, Silver Spring, MD 20993, USA