🤖 AI Summary
This work addresses the instability of traditional Integrated Gradients in saturated regions, which stems from its reliance on heuristic baselines, straight-line paths in input space, and discretization. The authors propose a novel attribution framework grounded in the space of predictive distributions: using the maximum-entropy distribution as a reference point, they construct attribution trajectories along Fisher–Rao geodesics and integrate gradients along these paths. This approach introduces Fisher–Rao geometry into attribution for the first time, replacing linear interpolation in input space with geodesic interpolation in distribution space. By incorporating KL and Euclidean trust regions together with the pullback Fisher metric, the method substantially improves calibration and stability of explanations. Evaluated across six ImageNet models, it significantly outperforms baselines on calibration metrics such as MAS while maintaining competitive performance in perturbation AUC and distortion measures.
📝 Abstract
Gradient-based attribution methods are model-faithful and scalable, but Integrated Gradients (IG) can be brittle because explanations depend on heuristic baselines, straight-line paths, discretization, and saturation. We propose Fisher--Rao Integrated Gradients (FRInGe), which defines both the reference and interpolation schedule in predictive distribution space. FRInGe replaces input baselines with a maximum-entropy predictive reference and follows a Fisher-Rao geodesic on the probability simplex. The corresponding input-space trajectory is realized through the pullback Fisher metric and stabilized by KL and Euclidean trust regions; attributions are obtained by integrating input gradients along this trajectory. Across six ImageNet architectures, FRInGe most clearly improves calibration-oriented attribution metrics, especially MAS scores, while remaining competitive on perturbation AUC and infidelity.