Counterfactual explainability of black-box prediction models

📅 2024-11-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing interpretability methods predominantly rely on statistical associations, failing to uncover the causal mechanisms underlying black-box models—especially when input variables exhibit dependencies, leading to inaccurate attribution. This paper introduces “counterfactual interpretability,” a novel conceptual framework for causal attribution. It extends global sensitivity analysis to a counterfactual causal setting, establishing a complete algebraic system of explanations encompassing main effects, interaction effects, and variable dependency structures. By integrating functional ANOVA, Sobol indices, and DAG-guided causal sensitivity analysis, the framework delivers causal-driven, decomposable, and quantifiable interpretations for black-box models under arbitrary dependency structures. Experiments demonstrate that our method significantly outperforms mainstream association-based approaches on causal paradox benchmarks, validating its superiority in revealing true causal influences.

Technology Category

Application Category

📝 Abstract
It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.
Problem

Research questions and friction points this paper is trying to address.

Develops counterfactual explainability for causal attribution in complex models
Extends global sensitivity analysis to dependent variables using causal graphs
Estimates causal mechanisms explaining income inequality by demographic factors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends sensitivity analysis to dependent variables
Uses directed acyclic graphs for causal relationships
Estimates counterfactual explainability under comonotonicity assumption
🔎 Similar Papers
No similar papers found.
Z
Zijun Gao
Department of Data Science and Operations, University of Southern California
Qingyuan Zhao
Qingyuan Zhao
University of Cambridge
StatisticsCausal InferenceSelective Inference