🤖 AI Summary
This paper addresses the challenges of detecting individual discrimination—particularly unidimensional and multidimensional intersectional discrimination—in classifier decisions, as well as the tendency of implicit bias to evade detection under conventional counterfactual fairness conditions. To this end, we propose Counterfactual Scenario Testing (CST), a novel method that generates counterfactual samples with joint perturbations of protected attributes (e.g., gender, race) and quantifies decision changes via k-nearest-neighbor matching and causal modeling to systematically identify both unidimensional and intersectional discrimination. CST is the first to formally embed the legal scenario-testing paradigm into a counterfactual causal framework. Empirically, we demonstrate that multiple unidimensional discriminations cannot substitute for intersectional discrimination, confirming its irreducibility. Moreover, CST provides interpretable confidence intervals for counterfactual fairness assessment. Experiments on synthetic and real-world datasets show that CST significantly improves discrimination detection rates—even in models satisfying standard counterfactual fairness criteria—thereby uncovering latent discriminatory behavior.
📝 Abstract
We present counterfactual situation testing (CST), a causal data mining framework for detecting individual discrimination in a dataset of classifier decisions. CST answers the question"what would have been the model outcome had the individual, or complainant, been of a different protected status?"It extends the legally-grounded situation testing (ST) of Thanh et al. (2011) by operationalizing the notion of fairness given the difference via counterfactual reasoning. ST finds for each complainant similar protected and non-protected instances in the dataset; constructs, respectively, a control and test group; and compares the groups such that a difference in outcomes implies a potential case of individual discrimination. CST, instead, avoids this idealized comparison by establishing the test group on the complainant's generated counterfactual, which reflects how the protected attribute when changed influences other seemingly neutral attributes of the complainant. Under CST we test for discrimination for each complainant by comparing similar individuals within each group but dissimilar individuals across groups. We consider single (e.g., gender) and multidimensional (e.g., gender and race) discrimination testing. For multidimensional discrimination we study multiple and intersectional discrimination and, as feared by legal scholars, find evidence that the former fails to account for the latter kind. Using a k-nearest neighbor implementation, we showcase CST on synthetic and real data. Experimental results show that CST uncovers a higher number of cases than ST, even when the model is counterfactually fair. In fact, CST extends counterfactual fairness (CF) of Kusner et al. (2017) by equipping CF with confidence intervals.