Explaining The Behavior Of Black-Box Prediction Algorithms With Causal Learning

📅 2020-06-03
🏛️ arXiv.org
📈 Citations: 15
Influential: 3
📄 PDF
🤖 AI Summary
Existing interpretability methods for black-box image classification models suffer from insufficient causal grounding and fail to distinguish genuine causal features from spurious correlations induced by unobserved confounders. Method: This paper proposes an explainability framework grounded in interventionist counterfactual causal reasoning. It constructs a causal graph model incorporating unobserved confounders, integrates structure learning with latent variable modeling to abstract raw pixels into high-level semantic features, and identifies true “difference-makers”—features whose counterfactual interventions alter predictions—under arbitrary unmeasured confounding. Contribution/Results: This work introduces the first systematic application of counterfactual causal explanation to black-box model auditing, enabling verifiable identification of causal drivers. Experiments on image classification tasks demonstrate significant improvements in causal feature identification accuracy, thereby supporting rigorous algorithmic attribution analysis and trustworthy model evaluation.
📝 Abstract
We propose to explain the behavior of black-box prediction methods (e.g., deep neural networks trained on image pixel data) using causal graphical models. Specifically, we explore learning the structure of a causal graph where the nodes represent prediction outcomes along with a set of macro-level "interpretable" features, while allowing for arbitrary unmeasured confounding among these variables. The resulting graph may indicate which of the interpretable features, if any, are possible causes of the prediction outcome and which may be merely associated with prediction outcomes due to confounding. The approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors which are "difference-makers" in an interventionist sense. The resulting analysis may be useful in algorithm auditing and evaluation, by identifying features which make a causal difference to the algorithm's output.
Problem

Research questions and friction points this paper is trying to address.

Explainable AI
Complex Prediction Algorithms
Feature Importance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Learning
Complex Prediction Systems
Intuitive Interpretability
🔎 Similar Papers
No similar papers found.