Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics

📅 2024-09-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

The eXplainable AI (XAI) field has long suffered from a lack of systematic evaluation frameworks, hindering principled method selection. To address this, we introduce LATEC, the first large-scale, multi-dimensionally controllable XAI benchmark—systematically varying model architectures, input modalities, and evaluation metrics across 17 explanation methods, 20 metrics, and 7,560 architecture-modality combinations. Our analysis reveals substantial metric incompatibility, undermining the reliability of conventional single-metric rankings; notably, Expected Gradients emerges as uniquely robust and superior under diverse multimodal and architectural settings. We publicly release 326k saliency maps and 378k metadata evaluation scores, accompanied by a task-aware XAI method selection guide. The core contributions are (1) a novel multi-dimensional controlled-evaluation paradigm and (2) a principled framework for analyzing metric consistency and interdependence.

Technology Category

Application Category

📝 Abstract

Explainable AI (XAI) is a rapidly growing domain with a myriad of proposed methods as well as metrics aiming to evaluate their efficacy. However, current studies are often of limited scope, examining only a handful of XAI methods and ignoring underlying design parameters for performance, such as the model architecture or the nature of input data. Moreover, they often rely on one or a few metrics and neglect thorough validation, increasing the risk of selection bias and ignoring discrepancies among metrics. These shortcomings leave practitioners confused about which method to choose for their problem. In response, we introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics. We systematically incorporate vital design parameters like varied architectures and diverse input modalities, resulting in 7,560 examined combinations. Through LATEC, we showcase the high risk of conflicting metrics leading to unreliable rankings and consequently propose a more robust evaluation scheme. Further, we comprehensively evaluate various XAI methods to assist practitioners in selecting appropriate methods aligning with their needs. Curiously, the emerging top-performing method, Expected Gradients, is not examined in any relevant related study. LATEC reinforces its role in future XAI research by publicly releasing all 326k saliency maps and 378k metric scores as a (meta-)evaluation dataset. The benchmark is hosted at: https://github.com/IML-DKFZ/latec.

Problem

Research questions and friction points this paper is trying to address.

Interpretable AI

Evaluation Framework

Uncertainty in Selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

LATEC

XAI Evaluation

Expected Gradients

🔎 Similar Papers

Explainable artificial intelligence: A survey of needs, techniques, applications, and future direction