🤖 AI Summary
This work addresses the instability of explanations generated by existing explainable artificial intelligence (XAI) methods under real-world conditions—such as input perturbations, feature redundancy, or model updates—and the consequent lack of systematic evaluation of explanation reliability. The study formally defines and quantifies XAI reliability through four axioms: robustness, feature redundancy consistency, model evolution smoothness, and resilience to distribution shifts. Building on these, it introduces a computable Explanation Reliability Index (ERI), along with a temporal variant, ERI-T, tailored for time-series models. Furthermore, the authors establish ERI-Bench, the first comprehensive benchmark dedicated to evaluating explanation stability. Experiments reveal that widely used methods like SHAP and Integrated Gradients often exhibit significant reliability deficiencies, whereas ERI effectively identifies and quantifies such instabilities, thereby providing a theoretical and empirical foundation for trustworthy XAI.
📝 Abstract
In recent years, explaining decisions made by complex machine learning models has become essential in high-stakes domains such as energy systems, healthcare, finance, and autonomous systems. However, the reliability of these explanations, namely, whether they remain stable and consistent under realistic, non-adversarial changes, remains largely unmeasured. Widely used methods such as SHAP and Integrated Gradients (IG) are well-motivated by axiomatic notions of attribution, yet their explanations can vary substantially even under system-level conditions, including small input perturbations, correlated representations, and minor model updates. Such variability undermines explanation reliability, as reliable explanations should remain consistent across equivalent input representations and small, performance-preserving model changes. We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms: robustness to small input perturbations, consistency under feature redundancy, smoothness across model evolution, and resilience to mild distributional shifts. For each axiom, we derive formal guarantees, including Lipschitz-type bounds and temporal stability results. We further propose ERI-T, a dedicated measure of temporal reliability for sequential models, and introduce ERI-Bench, a benchmark designed to systematically stress-test explanation reliability across synthetic and real-world datasets. Experimental results reveal widespread reliability failures in popular explanation methods, showing that explanations can be unstable under realistic deployment conditions. By exposing and quantifying these instabilities, ERI enables principled assessment of explanation reliability and supports more trustworthy explainable AI (XAI) systems.