Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Current evaluations of feature attribution robustness conflate changes in model outputs with instability in attribution maps, thereby obscuring inherent flaws in attribution methods. Method: The authors redefine attribution robustness as the stability of attribution maps under semantically similar inputs—specifically, adversarial perturbations generated by GANs—while permitting reasonable variations in model predictions. They propose a novel robustness metric based on structural similarity between attribution maps. Results: Under this refined framework, mainstream attribution methods—including Grad-CAM and Integrated Gradients—exhibit significantly reduced robustness scores, exposing their intrinsic instability. This work establishes a more objective and interpretable evaluation paradigm for attribution robustness, providing a principled benchmark for verifying the reliability of explainable AI systems.

Technology Category

Application Category

📝 Abstract

This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new way of evaluating the robustness of attribution methods. Specifically, we propose a new definition of similar inputs, a new robustness metric, and a novel method based on generative adversarial networks to generate these inputs. In addition, we present a comprehensive evaluation with existing metrics and state-of-the-art attribution methods. Our findings highlight the need for a more objective metric that reveals the weaknesses of an attribution method rather than that of the neural network, thus providing a more accurate evaluation of the robustness of attribution methods.

Problem

Research questions and friction points this paper is trying to address.

Challenges current attributional robustness ignoring model output differences

Proposes new similar inputs definition and robustness metric

Introduces GAN-based method for generating adversarial inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes new definition of similar inputs

Introduces new robustness evaluation metric

Uses generative adversarial networks for input generation

🔎 Similar Papers

No similar papers found.