On the Robustness of Global Feature Effect Explanations

📅 2024-06-13
🏛️ ECML/PKDD
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the robustness of global post-hoc interpretability methods—such as Partial Dependence Plots (PDP) and Accumulated Local Effects (ALE)—for black-box predictive models on tabular data under data and model perturbations. Motivated by reliability deficits arising from their fragility in model debugging and scientific discovery, we propose the first theoretical bounds quantifying explanation deviation under worst-case and best-case perturbations. Our approach integrates rigorous theoretical analysis with extensive empirical evaluation across multiple real-world and synthetic datasets, systematically assessing explanation sensitivity to controlled perturbations. Results demonstrate that even minor perturbations can substantially distort global feature effect estimates—up to complete reversal of effect direction in adversarial cases. This work establishes the first formal robustness framework for global interpretability, filling a critical gap in the theoretical foundations of trustworthy XAI. It provides the first quantifiable diagnostic benchmark for evaluating the stability and fidelity of global explanations.

Technology Category

Application Category

📝 Abstract
We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.
Problem

Research questions and friction points this paper is trying to address.

Study robustness of global post-hoc explanations for tabular data models
Assess vulnerability of feature effects to data and model perturbations
Quantify gap between best and worst-case interpretation scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Theoretical bounds for robustness evaluation
Analyzing partial dependence plots robustness
Quantifying global prediction interpretation gaps
🔎 Similar Papers
No similar papers found.