Conditional cross-fitting for unbiased machine-learning-assisted covariate adjustment in randomized experiments

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In randomized experiments, covariate adjustment can reduce the asymptotic variance of the Horvitz–Thompson estimator but introduces finite-sample bias due to data reuse; conventional cross-fitting relies on i.i.d. assumptions and is invalid under design-based inference. To address this, we propose **conditional cross-fitting**: a sample-splitting procedure that respects the experimental design structure—enabling unbiased average treatment effect (ATE) estimation under arbitrary randomization schemes (e.g., stratified, blocked, or cluster-randomized designs). Our method integrates flexible machine learning predictors, tolerates model misspecification, and simultaneously ensures unbiasedness and low variance. We establish its asymptotic unbiasedness and efficiency under design-based asymptotics, and demonstrate its robust finite-sample performance across diverse randomization designs via simulation studies.

Technology Category

Application Category

📝 Abstract
Randomized experiments are the gold standard for estimating the average treatment effect (ATE). While covariate adjustment can reduce the asymptotic variances of the unbiased Horvitz-Thompson estimators for the ATE, it suffers from finite-sample biases due to data reuse in both prediction and estimation. Traditional sample-splitting and cross-fitting methods can address the problem of data reuse and obtain unbiased estimators. However, they require that the data are independently and identically distributed, which is usually violated under the design-based inference framework for randomized experiments. To address this challenge, we propose a novel conditional cross-fitting method, under the design-based inference framework, where potential outcomes and covariates are fixed and the randomization is the sole source of randomness. We propose sample-splitting algorithms for various randomized experiments, including Bernoulli randomized experiments, completely randomized experiments, and stratified randomized experiments. Based on the proposed algorithms, we construct unbiased covariate-adjusted ATE estimators and propose valid inference procedures. Our methods can accommodate flexible machine-learning-assisted covariate adjustments and allow for model misspecification.
Problem

Research questions and friction points this paper is trying to address.

Addressing finite-sample biases in covariate-adjusted ATE estimators
Overcoming data reuse issues under design-based inference framework
Enabling unbiased machine-learning-assisted adjustment in randomized experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional cross-fitting for unbiased estimation
Design-based inference framework adaptation
Machine-learning-assisted covariate adjustment algorithms
🔎 Similar Papers
No similar papers found.
X
Xin Lu
Department of Statistics and Data Science, Washington University in St. Louis
L
Lei Shi
Division of Biostatistics, University of California, Berkeley
Hanzhong Liu
Hanzhong Liu
Tsinghua University
high dimensional statisticscausal inference
P
Peng Ding
Department of Statistics, University of California, Berkeley