T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients

📅 2024-04-25

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address critical challenges in deep learning interpretability—namely, unstable feature attributions and inconsistent explanations across instances—this paper proposes T-Explainer, a gradient-driven, model-agnostic, additive attribution framework grounded in Taylor expansion. T-Explainer systematically integrates first-order Taylor approximation with localized gradient computation, ensuring strict local fidelity and attribution consistency. Its core innovation lies in the first systematic application of Taylor expansion to additive attribution explanation, substantially enhancing explanation stability. Extensive experiments across multiple benchmark datasets demonstrate that T-Explainer outperforms state-of-the-art methods—including SHAP and Integrated Gradients—on attribution consistency metrics. The framework supports plug-and-play deployment and provides a reproducible quantitative evaluation pipeline alongside integrated visualization tools, enabling rigorous empirical validation and practical adoption.

Technology Category

Application Category

📝 Abstract

The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets. Modern learning models, while powerful, often exhibit a complexity level that renders them opaque black boxes, lacking transparency and hindering our understanding of their decision-making processes. Opacity challenges the practical application of machine learning, especially in critical domains requiring informed decisions. Explainable Artificial Intelligence (XAI) addresses that challenge, unraveling the complexity of black boxes by providing explanations. Feature attribution/importance XAI stands out for its ability to delineate the significance of input features in predictions. However, most attribution methods have limitations, such as instability, when divergent explanations result from similar or the same instance. This work introduces T-Explainer, a novel additive attribution explainer based on the Taylor expansion that offers desirable properties such as local accuracy and consistency. We demonstrate T-Explainer's effectiveness and stability over multiple runs in quantitative benchmark experiments against well-known attribution methods. Additionally, we provide several tools to evaluate and visualize explanations, turning T-Explainer into a comprehensive XAI framework.

Problem

Research questions and friction points this paper is trying to address.

Develops model-agnostic explainability framework for black-box models

Addresses instability in feature attribution methods for XAI

Provides tools for evaluating and visualizing model explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-agnostic explainability framework using gradients

Additive attribution based on Taylor expansion

Provides local accuracy and consistency properties

🔎 Similar Papers

No similar papers found.