On the Definition and Detection of Cherry-Picking in Counterfactual Explanations

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the risk of "cherry-picking" in counterfactual explanations—where explanation providers selectively present favorable instances to obscure model flaws—a problem that lacks formal characterization and reliable detection methods. The work introduces the first formal framework for this behavior, systematically analyzing its detectability under varying levels of auditor access. Through empirical evaluation integrating counterfactual generation algorithms, statistical hypothesis testing, and standard explanation quality metrics (e.g., proximity, plausibility, sparsity), the research reveals that the inherent diversity of counterfactuals often masks the impact of cherry-picking, rendering manipulated explanations statistically indistinguishable from legitimate ones. This finding underscores the limitations of post-hoc detection and advocates a shift toward governance paradigms based on ex-ante constraints.

Technology Category

Application Category

📝 Abstract

Counterfactual explanations are widely used to communicate how inputs must change for a model to alter its prediction. For a single instance, many valid counterfactuals can exist, which leaves open the possibility for an explanation provider to cherry-pick explanations that better suit a narrative of their choice, highlighting favourable behaviour and withholding examples that reveal problematic behaviour. We formally define cherry-picking for counterfactual explanations in terms of an admissible explanation space, specified by the generation procedure, and a utility function. We then study to what extent an external auditor can detect such manipulation. Considering three levels of access to the explanation process: full procedural access, partial procedural access, and explanation-only access, we show that detection is extremely limited in practice. Even with full procedural access, cherry-picked explanations can remain difficult to distinguish from non cherry-picked explanations, because the multiplicity of valid counterfactuals and flexibility in the explanation specification provide sufficient degrees of freedom to mask deliberate selection. Empirically, we demonstrate that this variability often exceeds the effect of cherry-picking on standard counterfactual quality metrics such as proximity, plausibility, and sparsity, making cherry-picked explanations statistically indistinguishable from baseline explanations. We argue that safeguards should therefore prioritise reproducibility, standardisation, and procedural constraints over post-hoc detection, and we provide recommendations for algorithm developers, explanation providers, and auditors.

Problem

Research questions and friction points this paper is trying to address.

cherry-picking

counterfactual explanations

explanation manipulation

admissible explanation space

auditing

Innovation

Methods, ideas, or system contributions that make the work stand out.

cherry-picking

counterfactual explanations

explanation auditing