Counterfactual explainability of black-box prediction models

📅 2024-11-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing interpretability methods predominantly rely on statistical associations, failing to uncover the causal mechanisms underlying black-box models—especially when input variables exhibit dependencies, leading to inaccurate attribution. This paper introduces “counterfactual interpretability,” a novel conceptual framework for causal attribution. It extends global sensitivity analysis to a counterfactual causal setting, establishing a complete algebraic system of explanations encompassing main effects, interaction effects, and variable dependency structures. By integrating functional ANOVA, Sobol indices, and DAG-guided causal sensitivity analysis, the framework delivers causal-driven, decomposable, and quantifiable interpretations for black-box models under arbitrary dependency structures. Experiments demonstrate that our method significantly outperforms mainstream association-based approaches on causal paradox benchmarks, validating its superiority in revealing true causal influences.

Technology Category

Application Category

📝 Abstract

It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.

Problem

Research questions and friction points this paper is trying to address.

Develops counterfactual explainability for causal attribution in complex models

Extends global sensitivity analysis to dependent variables using causal graphs

Estimates causal mechanisms explaining income inequality by demographic factors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends sensitivity analysis to dependent variables

Uses directed acyclic graphs for causal relationships

Estimates counterfactual explainability under comonotonicity assumption

🔎 Similar Papers

No similar papers found.