Valuing Winners: When and How to Correct for Selection Bias in Randomized Experiments

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses the winner’s curse in randomized experiments—where the estimated effect of the best-performing treatment is inflated by statistical noise—by distinguishing between global and selective winner’s curse and linking them to the regret incurred from deploying suboptimal policies. Focusing on seven decision-oriented evaluation objectives, the work systematically compares multiple bias-correction methods and argues that correction strategies should be aligned with specific managerial goals. The key innovation is an adaptive, tuning-free empirical likelihood procedure that integrates plug-in estimation, cross-fitting, and resampling to construct robust confidence intervals. Extensive experiments on both synthetic and real-world A/B test data demonstrate that no single method dominates universally across effect sizes; however, the proposed adaptive approach exhibits consistently strong performance in coverage accuracy.

📝 Abstract

Decision-makers often deploy the best-performing treatment from a randomized experiment, creating a winner's curse: selection favors treatments whose observed outcomes are high partly because of statistical noise, so the naïve estimate of the winner is upward biased. We distinguish two forms of winner's curse, bias relative to the true best treatment (global) and bias relative to the selected treatment's true mean (selective), and link them to regret from deploying a suboptimal treatment. This framework defines seven decision-relevant evaluation targets: mean bias, mean squared error, and confidence interval coverage for the global and selective winner's curse, and mean regret. We then show that methods that perform well on one target can perform poorly on others, so corrections should be matched to the manager's objective. Across simulations with varying effect sizes, multiple-arm settings, and data calibrated to an online A/B testing platform, no method dominates uniformly: the plug-in estimator performs best when treatment differences are large, cross-fitting performs best when treatments are similar, and resampling methods often achieve low mean squared error for moderate differences. We also introduce an adaptive empirical likelihood procedure that delivers asymptotically valid confidence intervals across settings without the tuning sensitivity of resampling-based methods.

Problem

Research questions and friction points this paper is trying to address.

selection bias

winner's curse

randomized experiments

treatment effect

evaluation targets

Innovation

Methods, ideas, or system contributions that make the work stand out.

winner's curse

selection bias

adaptive empirical likelihood