🤖 AI Summary
High-dimensional confounding variables often induce systematic bias in traditional AI/ML regression models for causal inference, leading to distorted estimates of the average treatment effect (ATE). This study is the first to quantitatively characterize how such bias affects ATE estimation and proposes a novel, unbiased estimation framework that integrates target trial emulation, high-dimensional causal inference, and enhanced machine learning regression to structurally overcome limitations of existing approaches. Empirical evaluation using UK Biobank data demonstrates that the proposed method robustly and accurately estimates the effect of opioid use on cardiovascular health among patients with chronic pain, thereby validating its effectiveness and practical utility.
📝 Abstract
Real-World Data (RWD), with its large sample sizes and rich clinical detail, offers a compelling alternative to randomized controlled trials (RCTs) for studying treatment effects in diverse and complex patient populations. However, its observational nature introduces confounding that prevents straightforward comparative effectiveness research. Target trial emulation leverages RWD to estimate average treatment effects (ATE) at the population scale and diversity that RCTs cannot achieve, yet its validity depends critically on unbiased ATE estimation under high-dimensional confounding. Many causal inference pipelines address high-dimensional confounding through machine learning and artificial intelligence (ML/AI) outcome regression. However, commonly used ML/AI regression models exhibit systematic prediction bias, with predicted outcomes shrinking toward the marginal outcome mean. This structural bias propagates into ATE estimation and cannot be corrected by cross-fitting, ensemble methods, or any standard ML practice. In this work, we first quantitatively characterize how systematic prediction bias in ML/AI outcome regression leads to biased ATE estimates in causal inference models. We further propose an unbiased ML/AI regression-based causal inference framework to ensure unbiased ATE estimation for observational studies. We demonstrate our approach by studying the effects of opioids on cardiovascular health in patients with chronic pain using UK Biobank data.