🤖 AI Summary
Current Monte Carlo simulation methods based on generative electronic health record (EHR) models suffer from sparse distributions, high computational costs, and large sampling variance, limiting their ability to accurately stratify patient risk. This work proposes two novel estimators, SCOPE and REACH, which leverage the previously overlooked next-token probability distributions within generative models to enable efficient and unbiased estimation of clinical outcomes. We establish, for the first time, that REACH provably reduces variance under any model and outcome, and uncover the critical role of outcome "spontaneity" in estimation efficiency. Built upon the ETHOS-ARES framework, our approach combines conditional outcome probability summation with expected conditional risk modeling, achieving the performance of conventional 100-sample Monte Carlo simulations with only 10–11 samples on MIMIC-IV data—reducing inference cost by approximately tenfold while maintaining good calibration—and improving ICU admission prediction efficiency by 1.2×.
📝 Abstract
Generative models trained using self-supervision of tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction. This is typically done using Monte Carlo simulation for future patient trajectories. However, existing approaches suffer from three key limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational costs, and high sampling variance. We propose two new estimators: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH), that leverage next-token probability distributions discarded by standard Monte Carlo. We prove both estimators are unbiased and that REACH guarantees variance reduction over Monte Carlo sampling for any model and outcome. Empirically, on hospital mortality prediction in MIMIC-IV using the ETHOS-ARES framework, SCOPE and REACH match 100-sample Monte Carlo performance using only 10-11 samples (95% CI: [9,11]), representing a ~10x reduction in inference cost without degrading calibration. For ICU admission prediction, efficiency gains are more modest (~1.2x), which we attribute to the outcome's lower"spontaneity,"a property we characterize theoretically and empirically. These methods substantially improve the feasibility of deploying generative EHR models in resource-constrained clinical settings.