Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI

📅 2024-11-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Gradient-based explainable AI (XAI) methods suffer from three key limitations: dependence on white-box model access, vulnerability to adversarial perturbations, and attribution drift away from the underlying data manifold. To address these, this paper proposes a derivative-free, manifold-constrained gradient approximation framework. Our method innovatively integrates diffusion models with ensemble Kalman filtering (EnKF) to estimate gradients in a black-box setting—driven solely by model outputs and constrained to the intrinsic data manifold. Manifold projection ensures geometric consistency during optimization. Notably, our framework unifies counterfactual generation and feature attribution within a single architecture, jointly optimizing for explanation faithfulness and perceptual plausibility. Extensive evaluations across multiple benchmarks demonstrate state-of-the-art performance on both tasks, with significant improvements in robustness, model fidelity, and perceptual alignment.

Technology Category

Application Category

📝 Abstract

Gradient-based methods are a prototypical family of explainability techniques, especially for image-based models. Nonetheless, they have several shortcomings in that they (1) require white-box access to models, (2) are vulnerable to adversarial attacks, and (3) produce attributions that lie off the image manifold, leading to explanations that are not actually faithful to the model and do not align well with human perception. To overcome these challenges, we introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG), a novel method that serves as an improved basis for explainability of a given neural network than the traditional gradient. Specifically, by leveraging ensemble Kalman filters and diffusion models, we derive a derivative-free approximation of the model's gradient projected onto the data manifold, requiring access only to the model's outputs. We demonstrate the effectiveness of FreeMCG by applying it to both counterfactual generation and feature attribution, which have traditionally been treated as distinct tasks. Through comprehensive evaluation on both tasks, counterfactual explanation and feature attribution, we show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.

Problem

Research questions and friction points this paper is trying to address.

Overcoming gradient-based XAI limitations in model access

Addressing adversarial vulnerability in explainability techniques

Ensuring explanations align with human perception and data manifold

Innovation

Methods, ideas, or system contributions that make the work stand out.

Derivative-free gradient approximation using ensemble Kalman filters

Diffusion models for manifold-constrained projections

Unified XAI for counterfactuals and feature attribution

🔎 Similar Papers

No similar papers found.