🤖 AI Summary
This paper studies contextual best-arm identification (contextual BAI) in context-adaptive experimental design: under a fixed sampling budget, dynamically allocating treatments based on covariates to recommend an individualized optimal policy that minimizes worst-case expected regret. We propose PLAS—a novel policy-learning-and-sampling framework—that jointly optimizes treatment allocation and policy estimation. PLAS is the first method to establish an asymptotically optimal theoretical foundation for contextual BAI. Its regret upper bound is shown to tightly match the information-theoretic lower bound, thereby achieving asymptotic optimality. The approach integrates techniques from best-arm identification, contextual bandits, asymptotic statistics, and regret analysis. By unifying statistical efficiency with computational feasibility, PLAS provides a new paradigm for evidence-driven, personalized policy design—rigorously grounded in theory yet practically implementable.
📝 Abstract
Evidence-based targeting has been a topic of growing interest among the practitioners of policy and business. Formulating decision-maker's policy learning as a fixed-budget best arm identification (BAI) problem with contextual information, we study an optimal adaptive experimental design for policy learning with multiple treatment arms. In the sampling stage, the planner assigns treatment arms adaptively over sequentially arriving experimental units upon observing their contextual information (covariates). After the experiment, the planner recommends an individualized assignment rule to the population. Setting the worst-case expected regret as the performance criterion of adaptive sampling and recommended policies, we derive its asymptotic lower bounds, and propose a strategy, Adaptive Sampling-Policy Learning strategy (PLAS), whose leading factor of the regret upper bound aligns with the lower bound as the size of experimental units increases.