Adaptive Experimental Design for Policy Learning

📅 2024-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies contextual best-arm identification (contextual BAI) in context-adaptive experimental design: under a fixed sampling budget, dynamically allocating treatments based on covariates to recommend an individualized optimal policy that minimizes worst-case expected regret. We propose PLAS—a novel policy-learning-and-sampling framework—that jointly optimizes treatment allocation and policy estimation. PLAS is the first method to establish an asymptotically optimal theoretical foundation for contextual BAI. Its regret upper bound is shown to tightly match the information-theoretic lower bound, thereby achieving asymptotic optimality. The approach integrates techniques from best-arm identification, contextual bandits, asymptotic statistics, and regret analysis. By unifying statistical efficiency with computational feasibility, PLAS provides a new paradigm for evidence-driven, personalized policy design—rigorously grounded in theory yet practically implementable.

Technology Category

Application Category

📝 Abstract
Evidence-based targeting has been a topic of growing interest among the practitioners of policy and business. Formulating decision-maker's policy learning as a fixed-budget best arm identification (BAI) problem with contextual information, we study an optimal adaptive experimental design for policy learning with multiple treatment arms. In the sampling stage, the planner assigns treatment arms adaptively over sequentially arriving experimental units upon observing their contextual information (covariates). After the experiment, the planner recommends an individualized assignment rule to the population. Setting the worst-case expected regret as the performance criterion of adaptive sampling and recommended policies, we derive its asymptotic lower bounds, and propose a strategy, Adaptive Sampling-Policy Learning strategy (PLAS), whose leading factor of the regret upper bound aligns with the lower bound as the size of experimental units increases.
Problem

Research questions and friction points this paper is trying to address.

Identify best treatment arm using contextual information
Minimize worst-case expected regret in policy learning
Develop optimal adaptive sampling strategy for experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Sampling-Policy Learning strategy
Minimax rate-optimal regret performance
Contextual best arm identification problem
🔎 Similar Papers
No similar papers found.