Integrated Gradient Correlation: a Dataset-wise Attribution Method

📅 2024-04-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing attribution methods operate at the single-sample level, limiting their ability to support task-level or dataset-scale interpretability—particularly when identifying critical input regions that must exhibit cross-sample stability. To address this, we propose Integrated Gradient Correlation (IGC), the first dataset-wise attribution framework. IGC models cross-sample correlations between feature attributions and model predictions, enabling region-specific statistical aggregation. It is compatible with both scalar prediction tasks (e.g., fMRI decoding) and classification tasks (e.g., MNIST). Crucially, IGC supports direct summation and statistical inference over cross-sample attribution maps. We validate IGC on the Natural Scenes Dataset (NSD) and MNIST, demonstrating its effectiveness in uncovering spatially selective decision patterns of deep models. By enabling large-scale, reproducible attribution analysis, IGC advances the frontier of interpretable deep learning.

Technology Category

Application Category

📝 Abstract

Attribution methods are primarily designed to study the distribution of input component contributions to individual model predictions. However, some research applications require a summary of attribution patterns across the entire dataset to facilitate the interpretability of the scrutinized models. In this paper, we present a new method called Integrated Gradient Correlation (IGC) that relates dataset-wise attributions to a model prediction score and enables region-specific analysis by a direct summation over associated components. We demonstrate our method on scalar predictions with the study of image feature representation in the brain from fMRI neural signals and the estimation of neural population receptive fields (NSD dataset), as well as on categorical predictions with the investigation of handwritten digit recognition (MNIST dataset). The resulting IGC attributions show selective patterns, revealing underlying model strategies coherent with their respective objectives.

Problem

Research questions and friction points this paper is trying to address.

Summarize dataset-wide attribution patterns for model interpretability

Localize stable important input regions across numerous components

Enable region-specific analysis and relate attributions to predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset-wise attribution method

Direct summation over components

Correlates attributions with prediction scores

🔎 Similar Papers

No similar papers found.