🤖 AI Summary
Existing methods for estimating individualized treatment effects often struggle to simultaneously mitigate confounding bias and preserve clinical heterogeneity, compromising patient-specific prediction accuracy. This work proposes a novel framework based on stochastic causal representation learning that formally characterizes the bias–precision trade-off for the first time. By introducing a sampling-based Maximum Mean Discrepancy (sMMD) strategy, the approach effectively debiases treatment effect estimates while retaining critical clinical variables. Instead of global adversarial balancing, it employs subset-level matching and integrates counterfactual prediction with an attribution-driven interpretability mechanism. Evaluated on two large ICU cohorts, the method reduces prediction error by up to 11.5% and substantially improves recall in high-risk scenarios. Human-in-the-loop assessments demonstrate its superiority over both resident physicians and large language models, enhancing clinicians’ decision accuracy by 14.7% and reducing decision time.
📝 Abstract
Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading patient-specific predictions. Here, we identify this tension as a bias-precision paradox in causal representation learning and introduce sampling-based maximum mean discrepancy (sMMD), a stochastic alignment strategy that replaces global adversarial balancing with subset-level matching. We instantiate this approach in a framework for counterfactual outcome prediction with attribution-grounded interpretability. Across two large-scale ICU cohorts (n = 27,783), our framework improves accuracy under distribution shift, reducing error by up to 11.5% and substantially increasing recall in high-risk tasks. Mechanistic analyses show that sMMD selectively preserves clinically decisive variables. In human-AI evaluation, our method outperforms clinicians-in-training and large language models, and improves clinician accuracy by 14.7% while reducing decision time, enabling interpretable, real-time clinical decision support.