Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
Existing integrated gradient methods are susceptible to gradient noise in the input space, and their interpolation paths may deviate from the true data manifold, leading to unreliable attributions. This work proposes constructing the integration path in the latent space of a pretrained variational autoencoder and constraining it to the learned data-generating manifold by decoding intermediate latent states. By doing so, the method introduces— for the first time—a manifold-alignment mechanism into the guided integrated gradients framework, effectively avoiding interference from regions corresponding to non-plausible samples. Experiments across multiple datasets and classifiers demonstrate that the proposed approach significantly outperforms existing path-based attribution methods. Both qualitative and quantitative results confirm its superior faithfulness and robustness in generating attributions.
📝 Abstract
Feature attribution is central to diagnosing and trusting deep neural networks, and Integrated Gradients (IG) is widely used due to its axiomatic properties. However, IG can yield unreliable explanations when the integration path between a baseline and the input passes through regions with noisy gradients. While Guided Integrated Gradients reduces this sensitivity by adaptively updating low-gradient-magnitude features, input-space guidance still produces intermediate inputs that deviate from the data manifold. To address this limitation, we propose \emph{Manifold-Aligned Guided Integrated Gradients} (MA-GIG), which constructs attribution paths in the latent space of a pre-trained variational autoencoder. By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold and reduces exposure to implausible input-space regions. Through qualitative and quantitative evaluations, we demonstrate that MA-GIG produces faithful explanations by aggregating gradients on path features proximal to the input. Consequently, our method reduces off-manifold noise and outperforms prior path-based attribution methods across multiple datasets and classifiers. Our code is available at https://github.com/leekwoon/ma-gig/.
Problem

Research questions and friction points this paper is trying to address.

feature attribution
Integrated Gradients
data manifold
off-manifold noise
reliable explanations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Manifold-Aligned
Guided Integrated Gradients
Latent Space
Feature Attribution
Variational Autoencoder
🔎 Similar Papers