🤖 AI Summary
Gradient-based integrated gradients (IG) suffer from reduced explanation fidelity when the sampling distribution along the integration path deviates significantly from the natural image manifold, undermining attribution reliability.
Method: We propose semi-optimal feature suppression sampling—a noise-free, differentiable technique that suppresses input features along the integration path to better align it with the underlying data manifold, thereby theoretically guaranteeing a lower bound on explanation determinism. Our method operates within the standard IG framework without requiring auxiliary modules (e.g., attention or additional activations).
Results: Evaluated on ImageNet across multiple state-of-the-art vision models, our approach consistently outperforms existing SOTA baselines. The resulting pixel-level attributions exhibit superior semantic coherence and localization accuracy, while maintaining computational efficiency and unbiasedness.
📝 Abstract
Image attribution analysis seeks to highlight the feature representations learned by visual models such that the highlighted feature maps can reflect the pixel-wise importance of inputs. Gradient integration is a building block in the attribution analysis by integrating the gradients from multiple derived samples to highlight the semantic features relevant to inferences. Such a building block often combines with other information from visual models such as activation or attention maps to form ultimate explanations. Yet, our theoretical analysis demonstrates that the extent to the alignment of the sample distribution in gradient integration with respect to natural image distribution gives a lower bound of explanation certainty. Prior works add noise into images as samples and the noise distributions can lead to low explanation certainty. Counter-intuitively, our experiment shows that extra information can saturate neural networks. To this end, building trustworthy attribution analysis needs to settle the sample distribution misalignment problem. Instead of adding extra information into input images, we present a semi-optimal sampling approach by suppressing features from inputs. The sample distribution by suppressing features is approximately identical to the distribution of natural images. Our extensive quantitative evaluation on large scale dataset ImageNet affirms that our approach is effective and able to yield more satisfactory explanations against state-of-the-art baselines throughout all experimental models.