🤖 AI Summary
This work addresses the limitations of existing counterfactual explanation methods, which are often confined to local instances and a single explanation type, lacking a global perspective and systematic categorization. The authors propose an axiomatic framework that characterizes the capability boundaries of counterfactual explainers through formally defined desiderata and establish impossibility theorems demonstrating that certain combinations of these axioms cannot be simultaneously satisfied. Building on this foundation, they construct a one-to-one correspondence between five compatible subsets of axioms and five fundamentally distinct types of counterfactual explanations. By leveraging representation theorems, formal logic, and complexity analysis, the study provides the first unified classification of mainstream counterfactual explainers, clarifying their behavioral properties and computational complexities.
📝 Abstract
Explaining autonomous and intelligent systems is critical in order to improve trust in their decisions. Counterfactuals have emerged as one of the most compelling forms of explanation. They address ``why not''questions by revealing how decisions could be altered. Despite the growing literature, most existing explainers focus on a single type of counterfactual and are restricted to local explanations, focusing on individual instances. There has been no systematic study of alternative counterfactual types, nor of global counterfactuals that shed light on a system's overall reasoning process. This paper addresses the two gaps by introducing an axiomatic framework built on a set of desirable properties for counterfactual explainers. It proves impossibility theorems showing that no single explainer can satisfy certain axiom combinations simultaneously, and fully characterizes all compatible sets. Representation theorems then establish five one-to-one correspondences between specific subsets of axioms and the families of explainers that satisfy them. Each family gives rise to a distinct type of counterfactual explanation, uncovering five fundamentally different types of counterfactuals. Some of these correspond to local explanations, while others capture global explanations. Finally, the framework situates existing explainers within this taxonomy, formally characterizes their behavior, and analyzes the computational complexity of generating such explanations.