🤖 AI Summary
This study addresses the challenge of high-dimensional covariate acquisition under budget constraints in randomized controlled trials. It formulates covariate collection as a sequential optimization problem embedded within causal inference, dynamically selecting the most predictive covariates in batches to enhance rerandomization and regression adjustment, thereby reducing the variance of average treatment effect estimation. The work establishes a decoupling theory proving that adaptive covariate selection preserves the validity of within-batch randomization, constructs a cumulative inverse-variance-weighted estimator with nominal asymptotic coverage, and derives a Bayesian risk bound that matches the minimax lower bound up to logarithmic factors. The proposed method, DARTS, integrates combinatorial Thompson sampling with a sequential experimental framework, significantly narrowing the efficiency gap relative to an ideal oracle design while rigorously maintaining inferential validity.
📝 Abstract
Randomized controlled trials typically assume that prognostic covariates are known and available at no cost. In practice, obtaining high-dimensional pretreatment data is costly, forcing a trade-off between covariate-adaptive precision and a measurement budget. We introduce Dynamic Adaptive Rerandomization via Thompson Sampling (DARTS), which treats covariate acquisition as a sequential optimization problem embedded within a design-based causal inference task. A budgeted combinatorial Thompson sampler learns which covariates are most prognostic across successive batches; selected covariates then drive rerandomization and regression adjustment to reduce batch-level average treatment effect variance. Our primary theoretical contribution is a decoupling result: adaptive covariate selection based on past batches preserves batch-level randomization validity, and the cumulative inverse-variance weighted estimator achieves at least nominal asymptotic coverage. We further derive a Bayes risk bound for the acquisition layer that matches the minimax lower bound up to logarithmic factors. Empirically, DARTS systematically concentrates the budget on informative features, significantly closing the efficiency gap to oracle designs while maintaining strict inferential validity.