Human-inspired Explanations for Vision Transformers and Convolutional Neural Networks

📅 2024-08-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing XAI methods face two major bottlenecks: strong architectural dependence (e.g., gradient- or activation-based approaches) or prohibitive computational overhead (e.g., perturbation-based methods). To address these limitations, we propose Foveation-based Explanations (FovEx), the first model-agnostic explanation paradigm inspired by the human foveal mechanism—central visual processing with spatially varying resolution. FovEx unifies explanations across both Vision Transformers (ViTs) and CNNs by integrating biologically grounded visual modeling, multi-scale foveated sampling, and perturbation-response analysis, augmented by cognitive-aligned evaluation metrics such as Normalized Scanpath Saliency (NSS). Extensive experiments demonstrate that FovEx achieves state-of-the-art performance on 4 out of 5 mainstream evaluation metrics for ViTs and 3 out of 5 for CNNs. Notably, its NSS score improves by 14% over RISE and by 203% over Grad-CAM, significantly enhancing alignment between saliency maps and human eye-tracking patterns—thereby narrowing the semantic gap between machine reasoning and human perception.

Technology Category

Application Category

📝 Abstract

We introduce Foveation-based Explanations (FovEx), a novel human-inspired visual explainability (XAI) method for Deep Neural Networks. Our method achieves state-of-the-art performance on both transformer (on 4 out of 5 metrics) and convolutional models (on 3 out of 5 metrics), demonstrating its versatility. Furthermore, we show the alignment between the explanation map produced by FovEx and human gaze patterns (+14% in NSS compared to RISE, +203% in NSS compared to gradCAM), enhancing our confidence in FovEx's ability to close the interpretation gap between humans and machines.

Problem

Research questions and friction points this paper is trying to address.

Develops human-inspired XAI method for vision models

Reduces computational cost of perturbation-based explanations

Improves alignment between AI and human interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Foveation-based biologically inspired perturbations

Combines gradient-based visual explorations

Generates high-performance attribution maps

🔎 Similar Papers

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers