Robustifying and Selecting Cohort-Appropriate Prognostic Models under Distributional Shifts

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

169K/year
🤖 AI Summary
This study addresses the poor external calibration and limited generalizability of prognostic models under distributional shift by proposing a dual-path strategy. For model developers, it constructs covariate and outcome distributions representative of the target population via meta-analysis to train an “average optimal” model; for end users, it selects the best-performing model based on outcome distribution similarity between cohorts. The approach quantifies distributional discrepancies using Kullback–Leibler divergence, evaluates calibration performance with the Integrated Calibration Index (ICI), and assesses clinical utility through decision curve analysis (DCA). Experiments demonstrate that greater distributional divergence correlates with worse calibration, whereas meta-analysis–based weighting significantly improves calibration (p = 0.037). Moreover, models developed in cohorts with outcome distributions similar to the target population exhibit superior calibration and higher clinical net benefit.

Technology Category

Application Category

📝 Abstract
External validation is widely regarded as the gold standard for prognostic model evaluation. In this study, we challenge the assumption that successful external calibration guarantees model generalizability and propose two complementary strategies to improve transportability of prognostic models across cohorts. Using six real-world surgical cohorts from tertiary academic centers, we tested whether successful external calibration depends largely on similarity in covariates and outcomes between training and validation cohorts, quantified using Kullback-Leibler (KL) divergence, with calibration assessed by the Integrated Calibration Index (ICI). From the model-developer's perspective, we trained the "best-on-average" prognostic model by tuning toward a meta-analysis-derived covariate and outcome distribution as an approximation of the broader target population. From the end-user perspective, we proposed a simple measure for cohort outcome similarity to identify, among published models, the one most suitable for a given target cohort in terms of both calibration and clinical utility. External calibration worsened as distributional mismatch increased. Higher KL divergence was associated with higher ICI in both surgery-alone (Spearman $ρ=0.614$, $p=0.004$) and surgery + adjuvant chemotherapy cohorts (Spearman $ρ=0.738$, $p<0.001$). Meta-analysis-informed weighting improved calibration in most settings without materially affecting discrimination, with the clearest benefit when evaluated on the aggregated external population ($p=0.037$). Models developed in more similar cohorts achieved lower ICI in surgery-alone (Spearman $ρ=0.803$, $p<0.001$) and surgery + adjuvant chemotherapy cohorts (Spearman $ρ=0.737$, $p<0.001$), and provided greater clinical utility on DCA.
Problem

Research questions and friction points this paper is trying to address.

prognostic models
distributional shifts
external validation
calibration
cohort selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

distributional shift
external calibration
meta-analysis-informed modeling
cohort similarity
prognostic model transportability
Dimitris Bertsimas
Dimitris Bertsimas
Boeing Professor of Operations Research, MIT
Operations ResearchOptimizationStochasticsAnalyticsHealth Care
C
Carol Gao
Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
A
Angelos G. Koulouras
Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
G
Georgios Antonios Margonis
Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.; Charité – Universitätsmedizin Berlin, Berlin, 10117, Germany.