Trustworthy AI/ML Regression and Unbiased Causal Inference for Real-World Data

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional confounding variables often induce systematic bias in traditional AI/ML regression models for causal inference, leading to distorted estimates of the average treatment effect (ATE). This study is the first to quantitatively characterize how such bias affects ATE estimation and proposes a novel, unbiased estimation framework that integrates target trial emulation, high-dimensional causal inference, and enhanced machine learning regression to structurally overcome limitations of existing approaches. Empirical evaluation using UK Biobank data demonstrates that the proposed method robustly and accurately estimates the effect of opioid use on cardiovascular health among patients with chronic pain, thereby validating its effectiveness and practical utility.
📝 Abstract
Real-World Data (RWD), with its large sample sizes and rich clinical detail, offers a compelling alternative to randomized controlled trials (RCTs) for studying treatment effects in diverse and complex patient populations. However, its observational nature introduces confounding that prevents straightforward comparative effectiveness research. Target trial emulation leverages RWD to estimate average treatment effects (ATE) at the population scale and diversity that RCTs cannot achieve, yet its validity depends critically on unbiased ATE estimation under high-dimensional confounding. Many causal inference pipelines address high-dimensional confounding through machine learning and artificial intelligence (ML/AI) outcome regression. However, commonly used ML/AI regression models exhibit systematic prediction bias, with predicted outcomes shrinking toward the marginal outcome mean. This structural bias propagates into ATE estimation and cannot be corrected by cross-fitting, ensemble methods, or any standard ML practice. In this work, we first quantitatively characterize how systematic prediction bias in ML/AI outcome regression leads to biased ATE estimates in causal inference models. We further propose an unbiased ML/AI regression-based causal inference framework to ensure unbiased ATE estimation for observational studies. We demonstrate our approach by studying the effects of opioids on cardiovascular health in patients with chronic pain using UK Biobank data.
Problem

Research questions and friction points this paper is trying to address.

Trustworthy AI
Causal Inference
Real-World Data
Average Treatment Effect
Prediction Bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

unbiased causal inference
systematic prediction bias
target trial emulation
average treatment effect
machine learning regression
🔎 Similar Papers
No similar papers found.
Y
Yifei Xu
Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, School of Medicine, University of Maryland; Maryland Psychiatric Research Center, Department of Psychiatry, School of Medicine, University of Maryland; The University of Maryland Institute for Health Computing (UM-IHC)
H
Hwiyoung Lee
Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, School of Medicine, University of Maryland; Maryland Psychiatric Research Center, Department of Psychiatry, School of Medicine, University of Maryland; The University of Maryland Institute for Health Computing (UM-IHC)
Zhenyao Ye
Zhenyao Ye
University of Maryland, Baltimore
BioinformaticsBiostatisticsHuman Genetics
Y
Yezhi Pan
Department of Mathematics, University of Maryland, College Park
J
Jingsong Zhou
Department of Mathematics, University of Maryland, College Park
Yun Yang
Yun Yang
Department of Mathematics, University of Maryland, College Park
Bayesian StatisticsHigh-dimensional StatisticsMachine LearningOptimal TransportOptimization
Chixiang Chen
Chixiang Chen
Associate Professor in Biostatistics, University of Maryland School of Medicine, Baltimore.
Statistics and Biostatistics
Shuo Chen
Shuo Chen
Professor, University of Maryland, School of Medicine
Biostatisticsneuroimaging statisticsclinical data analysis