🤖 AI Summary
Adaptive experiments are increasingly deployed in real-world settings, yet their statistical inference foundations remain underdeveloped—particularly for two-stage designs, where the asymptotic behavior of the weighted inverse probability weighting (WIPW) estimator lacks a unified characterization across signal regimes. This paper establishes the first weak convergence theory under minimally restrictive assumptions, breaking away from conventional strong parametric or stability conditions, and systematically uncovers novel statistical phase transitions intrinsic to adaptive experimentation. We propose a falsifiable plug-in bootstrap procedure tailored to non-normal limiting distributions. Theoretically, we prove the asymptotic validity of the WIPW estimator under *any* adaptive design. Extensive simulations and semi-synthetic experiments demonstrate its robustness in small samples and under dynamic treatment allocation, as well as its high statistical power.
📝 Abstract
Adaptive experiments are becoming increasingly popular in real-world applications for effectively maximizing in-sample welfare and efficiency by data-driven sampling. Despite their growing prevalence, however, the statistical foundations for valid inference in such settings remain underdeveloped. Focusing on two-stage adaptive experimental designs, we address this gap by deriving new weak convergence results for mean outcomes and their differences. In particular, our results apply to a broad class of estimators, the weighted inverse probability weighted (WIPW) estimators. In contrast to prior works, our results require significantly weaker assumptions and sharply characterize phase transitions in limiting behavior across different signal regimes. Through this common lens, our general results unify previously fragmented results under the two-stage setup. To address the challenge of potential non-normal limits in conducting inference, we propose a computationally efficient and provably valid plug-in bootstrap method for hypothesis testing. Our results and approaches are sufficiently general to accommodate various adaptive experimental designs, including batched bandit and subgroup enrichment experiments. Simulations and semi-synthetic studies demonstrate the practical value of our approach, revealing statistical phenomena unique to adaptive experiments.