🤖 AI Summary
This paper introduces “X-hacking”—a novel form of methodological bias wherein researchers systematically search the Rashomon set (i.e., the set of models with comparable predictive performance but divergent explanations) to manipulate XAI attribution metrics (e.g., SHAP values) in support of preconceived conclusions, analogous to p-hacking in statistics. To empirically validate X-hacking, the authors propose a multi-objective optimization framework that identifies models satisfying both high predictive accuracy and *a priori* desired explanation patterns, leveraging AutoML on UCI/tabular benchmarks. Results demonstrate that X-hacking is prevalent across mainstream XAI practices, critically undermining the trustworthiness and reproducibility of explainable AI. As a key contribution, the paper presents the first dedicated detection and mitigation mechanism for X-hacking, advocating a paradigm shift in XAI evaluation—from isolated explanation fidelity toward joint verification of explanation validity and predictive performance.
📝 Abstract
Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how an automated machine learning pipeline can be used to search for 'defensible' models that produce a desired explanation while maintaining superior predictive performance to a common baseline. We formulate the trade-off between explanation and accuracy as a multi-objective optimization problem and illustrate the feasibility and severity of X-hacking empirically on familiar real-world datasets. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI research.