On Spectral Properties of Gradient-based Explanation Methods

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

The reliability of deep neural network interpretability methods is hindered by weak theoretical foundations and insufficient formalization. Method: This paper introduces a spectral-analytic perspective to systematically uncover a pervasive spectral bias in gradient-based attribution, establishing the first formal analytical framework that unifies probabilistic modeling with spectral theory. Building on this, it theoretically justifies key design choices—including squared gradients and input perturbations—and proposes a standardized perturbation scale mechanism and SpectralLens, a novel spectral-aware attribution aggregation method to enhance explanation consistency. Results: Extensive experiments validate the theoretical insights: SpectralLens significantly improves attribution stability and reliability across diverse architectures and datasets, yielding consistent, interpretable, and spectrally grounded feature attributions. The work provides both a rigorous spectral-theoretic foundation for explainable AI and a practical, deployable tool for robust model interpretation.

Technology Category

Application Category

📝 Abstract

Understanding the behavior of deep networks is crucial to increase our confidence in their results. Despite an extensive body of work for explaining their predictions, researchers have faced reliability issues, which can be attributed to insufficient formalism. In our research, we adopt novel probabilistic and spectral perspectives to formally analyze explanation methods. Our study reveals a pervasive spectral bias stemming from the use of gradient, and sheds light on some common design choices that have been discovered experimentally, in particular, the use of squared gradient and input perturbation. We further characterize how the choice of perturbation hyperparameters in explanation methods, such as SmoothGrad, can lead to inconsistent explanations and introduce two remedies based on our proposed formalism: (i) a mechanism to determine a standard perturbation scale, and (ii) an aggregation method which we call SpectralLens. Finally, we substantiate our theoretical results through quantitative evaluations.

Problem

Research questions and friction points this paper is trying to address.

Analyzes spectral bias in gradient-based explanation methods

Addresses inconsistency in perturbation hyperparameters like SmoothGrad

Proposes remedies: standard perturbation scale and SpectralLens

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adopt probabilistic and spectral perspectives

Characterize perturbation hyperparameters impact

Introduce SpectralLens aggregation method

🔎 Similar Papers

Uncertainty Quantification for Gradient-based Explanations in Neural Networks