🤖 AI Summary
Current experimental evaluation methods for prediction-assisted human decision-making in automated decision systems (ADS) rely on oversimplified cognitive assumptions, inducing systematic cognitive biases in human decision-makers and yielding severely biased estimates of intervention effects.
Method: This paper formally models diverse human response behaviors to predictive assistance, revealing that inter-individual behavioral dependencies fundamentally violate standard causal identification assumptions—particularly the Stable Unit Treatment Value Assumption (SUTVA). Integrating behavioral modeling, causal inference, and experimental design analysis, we develop an identifiable framework for estimating intervention effects.
Contribution/Results: We demonstrate that mainstream evaluation protocols suffer from systematic bias; we propose a novel evaluation paradigm that jointly ensures behavioral fidelity and causal identifiability. Empirical validation confirms that this approach significantly enhances the scientific validity and reliability of ADS impact assessment.
📝 Abstract
Automated decision systems (ADS) are broadly deployed to inform and support human decision-making across a wide range of consequential settings. However, various context-specific details complicate the goal of establishing meaningful experimental evaluations for prediction-based interventions. Notably, current experiment designs rely on simplifying assumptions about human decision making in order to derive causal estimates. In reality, specific experimental design decisions may induce cognitive biases in human decision makers, which could then significantly alter the observed effect sizes of the prediction intervention. In this paper, we formalize and investigate various models of human decision-making in the presence of a predictive model aid. We show that each of these behavioural models produces dependencies across decision subjects and results in the violation of existing assumptions, with consequences for treatment effect estimation. This work aims to further advance the scientific validity of intervention-based evaluation schemes for the assessment of ADS deployments.