When Can You Trust Your Explanations? A Robustness Analysis on Feature Importances

📅 2024-06-20

📈 Citations: 1

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the robustness evaluation of feature importance explanations in eXplainable Artificial Intelligence (XAI) under non-adversarial perturbations. Existing methods fail to model natural perturbations intrinsic to the data manifold; we thus present the first systematic analysis of explanation fragility under such realistic, non-adversarial distortions. To ensure perturbations remain faithful to the underlying data distribution, we propose a manifold-aware perturbation generation strategy grounded in the manifold hypothesis. Further, we introduce a multi-explainer ensemble framework that aggregates explanations via consistency-based fusion to jointly enhance robustness and interpretability. Extensive experiments on multiple tabular datasets reveal that mainstream explanation methods—including Grad-CAM and SHAP variants—exhibit substantial robustness deficiencies. In contrast, our ensemble framework significantly improves explanation stability and decision trustworthiness. We publicly release an open-source evaluation framework enabling reproducible, quantitative robustness assessment across diverse XAI methods.

Technology Category

Application Category

📝 Abstract

Recent legislative regulations have underlined the need for accountable and transparent artificial intelligence systems and have contributed to a growing interest in the Explainable Artificial Intelligence (XAI) field. Nonetheless, the lack of standardized criteria to validate explanation methodologies remains a major obstacle to developing trustworthy systems. We address a crucial yet often overlooked aspect of XAI, the robustness of explanations, which plays a central role in ensuring trust in both the system and the provided explanation. To this end, we propose a novel approach to analyse the robustness of neural network explanations to non-adversarial perturbations, leveraging the manifold hypothesis to produce new perturbed datapoints that resemble the observed data distribution. We additionally present an ensemble method to aggregate various explanations, showing how merging explanations can be beneficial for both understanding the model's decision and evaluating the robustness. The aim of our work is to provide practitioners with a framework for evaluating the trustworthiness of model explanations. Experimental results on feature importances derived from neural networks applied to tabular datasets highlight the importance of robust explanations in practical applications.

Problem

Research questions and friction points this paper is trying to address.

Analyzing robustness of neural network explanations to perturbations

Proposing ensemble method to aggregate and evaluate explanations

Providing framework for assessing trustworthiness of model explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing neural network explanation robustness via perturbations

Leveraging manifold hypothesis for perturbed datapoints generation

Ensemble method aggregates explanations for model understanding

🔎 Similar Papers

On the Robustness of Global Feature Effect Explanations