Pixel-level Certified Explanations via Randomized Smoothing

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing pixel-wise attribution methods exhibit high sensitivity to minor input perturbations, yielding unstable attribution maps despite invariant model predictions—severely undermining trustworthiness. To address this, we propose the first provably robust certification framework for arbitrary black-box attribution methods at the pixel level. Our approach leverages randomized smoothing to formulate attribution robustness verification as a binary segmentation problem and introduces three principled evaluation metrics: certified robustness, localization accuracy, and fidelity. Key technical components include attribution map sparsification and smoothing, coupled with deterministic ℓ₂-perturbation certification. We conduct comprehensive experiments across five ImageNet models and twelve attribution methods. Results demonstrate that our certified attribution maps achieve strong robustness guarantees while preserving interpretability and high fidelity—enabling reliable downstream applications in trustworthy AI.

Technology Category

Application Category

📝 Abstract

Post-hoc attribution methods aim to explain deep learning predictions by highlighting influential input pixels. However, these explanations are highly non-robust: small, imperceptible input perturbations can drastically alter the attribution map while maintaining the same prediction. This vulnerability undermines their trustworthiness and calls for rigorous robustness guarantees of pixel-level attribution scores. We introduce the first certification framework that guarantees pixel-level robustness for any black-box attribution method using randomized smoothing. By sparsifying and smoothing attribution maps, we reformulate the task as a segmentation problem and certify each pixel's importance against $ell_2$-bounded perturbations. We further propose three evaluation metrics to assess certified robustness, localization, and faithfulness. An extensive evaluation of 12 attribution methods across 5 ImageNet models shows that our certified attributions are robust, interpretable, and faithful, enabling reliable use in downstream tasks. Our code is at https://github.com/AlaaAnani/certified-attributions.

Problem

Research questions and friction points this paper is trying to address.

Ensuring robustness of pixel-level attribution scores

Certifying pixel importance against input perturbations

Evaluating certified robustness for attribution methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Certifies pixel-level robustness via randomized smoothing

Reformulates attribution as segmentation problem

Proposes three evaluation metrics for robustness

🔎 Similar Papers

Segmentation and Smoothing Affect Explanation Quality More Than the Choice of Perturbation-based XAI Method for Image Explanations