🤖 AI Summary
Existing XAI methods face two major bottlenecks: strong architectural dependence (e.g., gradient- or activation-based approaches) or prohibitive computational overhead (e.g., perturbation-based methods). To address these limitations, we propose Foveation-based Explanations (FovEx), the first model-agnostic explanation paradigm inspired by the human foveal mechanism—central visual processing with spatially varying resolution. FovEx unifies explanations across both Vision Transformers (ViTs) and CNNs by integrating biologically grounded visual modeling, multi-scale foveated sampling, and perturbation-response analysis, augmented by cognitive-aligned evaluation metrics such as Normalized Scanpath Saliency (NSS). Extensive experiments demonstrate that FovEx achieves state-of-the-art performance on 4 out of 5 mainstream evaluation metrics for ViTs and 3 out of 5 for CNNs. Notably, its NSS score improves by 14% over RISE and by 203% over Grad-CAM, significantly enhancing alignment between saliency maps and human eye-tracking patterns—thereby narrowing the semantic gap between machine reasoning and human perception.
📝 Abstract
We introduce Foveation-based Explanations (FovEx), a novel human-inspired visual explainability (XAI) method for Deep Neural Networks. Our method achieves state-of-the-art performance on both transformer (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14% in NSS compared to RISE, +203% in NSS compared to gradCAM), enhancing our confidence in FovEx's ability to close the interpretation gap between humans and machines.