🤖 AI Summary
This work proposes the first framework that integrates active learning into heterogeneous treatment effect estimation under budget constraints in randomized controlled trials, particularly when observational data exhibit policy-induced bias. Leveraging prior knowledge from observational data, the method designs an acquisition function that iteratively selects the most informative samples for experimental evaluation by jointly optimizing uncertainty, insufficient overlap, and distributional discrepancy between domains. Theoretical analysis, grounded in martingale central limit theorems and minimax lower bounds, establishes the information-theoretic optimality of the proposed approach. Empirical evaluations on industrial datasets demonstrate substantial improvements over random sampling baselines, achieving more accurate causal effect estimates at significantly lower experimental costs.
📝 Abstract
Estimating heterogeneous treatment effects is central to data-driven decision-making, yet industrial applications often face a fundamental tension between limited randomized controlled trial (RCT) budgets and abundant but biased observational data collected under historical targeting policies. Although observational logs offer the advantage of scale, they inherently suffer from severe policyinduced imbalance and overlap violations, rendering standalone estimation unreliable. We propose a budgeted active experimentation framework that iteratively enhances model training for causal effect estimation via active sampling. By leveraging observational priors, we develop an acquisition function targeting uplift estimation uncertainty, overlap deficits, and domain discrepancy to select the most informative units for randomized experiments. We establish finite-sample deviation bounds, asymptotic normality via martingale Central Limit Theorems (CLTs), and minimax lower bounds to prove information-theoretic optimality. Extensive experiments on industrial datasets demonstrate that our approach significantly outperforms standard randomized baselines in cost-constrained settings.