Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current evaluations of feature attribution robustness conflate changes in model outputs with instability in attribution maps, thereby obscuring inherent flaws in attribution methods. Method: The authors redefine attribution robustness as the stability of attribution maps under semantically similar inputs—specifically, adversarial perturbations generated by GANs—while permitting reasonable variations in model predictions. They propose a novel robustness metric based on structural similarity between attribution maps. Results: Under this refined framework, mainstream attribution methods—including Grad-CAM and Integrated Gradients—exhibit significantly reduced robustness scores, exposing their intrinsic instability. This work establishes a more objective and interpretable evaluation paradigm for attribution robustness, providing a principled benchmark for verifying the reliability of explainable AI systems.

Technology Category

Application Category

📝 Abstract
This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new way of evaluating the robustness of attribution methods. Specifically, we propose a new definition of similar inputs, a new robustness metric, and a novel method based on generative adversarial networks to generate these inputs. In addition, we present a comprehensive evaluation with existing metrics and state-of-the-art attribution methods. Our findings highlight the need for a more objective metric that reveals the weaknesses of an attribution method rather than that of the neural network, thus providing a more accurate evaluation of the robustness of attribution methods.
Problem

Research questions and friction points this paper is trying to address.

Challenges current attributional robustness ignoring model output differences
Proposes new similar inputs definition and robustness metric
Introduces GAN-based method for generating adversarial inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes new definition of similar inputs
Introduces new robustness evaluation metric
Uses generative adversarial networks for input generation
🔎 Similar Papers
No similar papers found.
P
Panagiota Kiourti
Boston University
A
Anu Singh
Intuit Inc.
P
Preeti Duraipandian
Intuit Inc.
Weichao Zhou
Weichao Zhou
Boston University
Wenchao Li
Wenchao Li
Associate Professor, Boston University
Neuro-Symbolic AIAI Safety