Causal machine learning for heterogeneous treatment effects in the presence of missing outcome data

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Estimating heterogeneous causal effects (CATE) under missing-at-random (MAR) outcome data introduces asymptotic bias due to subgroup representation distortion. Method: We propose two debiased machine learning estimators—mDR-learner and mEP-learner—that correct for this distortion via inverse probability weighting of censoring probabilities. Contribution/Results: This work provides the first systematic characterization of the asymptotic bias mechanism induced by MAR missing outcomes on CATE estimation. We construct the first CATE estimator that simultaneously achieves oracle efficiency and practical implementability, overcoming theoretical and empirical limitations of conventional imputation and complete-case methods. Theoretically, our estimators are doubly robust and semiparametrically efficient. In simulations and the ACTG175 clinical trial, they significantly outperform existing approaches—particularly under high missingness (>40%) and strong treatment effect heterogeneity—improving CATE estimation accuracy by over 20%.

Technology Category

Application Category

📝 Abstract
When estimating heterogeneous treatment effects, missing outcome data can complicate treatment effect estimation, causing certain subgroups of the population to be poorly represented. In this work, we discuss this commonly overlooked problem and consider the impact that missing at random (MAR) outcome data has on causal machine learning estimators for the conditional average treatment effect (CATE). We then propose two de-biased machine learning estimators for the CATE, the mDR-learner and mEP-learner, which address the issue of under-representation by integrating inverse probability of censoring weights into the DR-learner and EP-learner respectively. We show that under reasonable conditions, these estimators are oracle efficient, and illustrate their favorable performance through simulated data settings, comparing them to existing CATE estimators, including comparison to estimators which use common missing data techniques. Guidance on the implementation of these estimators is provided and we present an example of their application using the ACTG175 trial, exploring treatment effect heterogeneity when comparing Zidovudine mono-therapy against alternative antiretroviral therapies among HIV-1-infected individuals.
Problem

Research questions and friction points this paper is trying to address.

Causal Inference
Machine Learning
Treatment Effect Heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

mDR learner
mEP learner
heterogeneous treatment effect estimation
🔎 Similar Papers
No similar papers found.
M
Matthew Pryce
Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, United Kingdom
K
Karla Diaz-Ordaz
Department of Statistical Science, University College London, London, United Kingdom
R
Ruth H. Keogh
Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, United Kingdom
Stijn Vansteelandt
Stijn Vansteelandt
Professor of Statistics, Ghent University
Causal inferenceCausal Machine LearningEpidemiologic methodsMediation analysisSemiparametric