🤖 AI Summary
This study addresses causal inference under concurrent confounding and missingness in electronic health record (EHR) data. We critically examine the limitations of conventional two-stage approaches—such as imputation followed by inverse probability weighting (IPW) or outcome regression—which suffer from high bias and poor confidence interval coverage under model misspecification. To this end, we systematically evaluate the nonparametric, efficient, doubly robust joint estimator recently proposed by Levis et al., conducting the first comprehensive comparison against standard methods—including IPW, outcome regression, and multiple imputation—within a simulation framework explicitly designed to reflect realistic EHR study motivations. Results demonstrate no universally dominant method; however, the joint estimator exhibits superior overall robustness. Specifically, it achieves substantially lower bias and higher confidence interval coverage than most two-stage alternatives when both confounding and missingness co-occur under model misspecification. Consequently, we recommend this joint estimator as the default approach for causal inference in such complex, real-world EHR settings.
📝 Abstract
Causal inference methods based on electronic health record (EHR) databases must simultaneously handle confounding and missing data. Vast scholarship exists aimed at addressing these two issues separately, but surprisingly few papers attempt to address them simultaneously. In practice, when faced with simultaneous missing data and confounding, analysts may proceed by first imputing missing data and subsequently using outcome regression or inverse-probability weighting (IPW) to address confounding. However, little is known about the theoretical performance of such $ extit{ad hoc}$ methods. In a recent paper Levis $ extit{et al.}$ outline a robust framework for tackling these problems together under certain identifying conditions, and introduce a pair of estimators for the average treatment effect (ATE), one of which is non-parametric efficient. In this work we present a series of simulations, motivated by a published EHR based study of the long-term effects of bariatric surgery on weight outcomes, to investigate these new estimators and compare them to existing $ extit{ad hoc}$ methods. While the latter perform well in certain scenarios, no single estimator is uniformly best. As such, the work of Levis $ extit{et al.}$ may serve as a reasonable default for causal inference when handling confounding and missing data together.