🤖 AI Summary
Individual risk prediction in clinical AI faces underappreciated triple uncertainty—estimation, model, and applicability uncertainty—challenging the assumption that high population-level performance (e.g., AUC > 0.9) ensures reliable individual predictions. Method: Leveraging real and synthetic ovarian cancer datasets, we trained 59,400 model variants and quantified inter-model variability in risk estimates for 100 patients. Contribution/Results: We demonstrate that individual risk standard deviations can reach ±0.3 despite strong aggregate discrimination. Increasing sample size only mitigates estimation uncertainty; model and applicability uncertainties dominate overall variability and remain undetectable within single-study frameworks. This is the first systematic evidence that these two uncertainties frequently exceed estimation uncertainty, undermining the “population performance implies individual reliability” paradigm. We advocate uncertainty-aware, clinician–patient shared decision-making over algorithmic replacement and propose a methodological framework for evaluating individual-level predictive distributions.
📝 Abstract
Background: Clinical prediction models for a health condition are commonly evaluated regarding performance for a population, although decisions are made for individuals. The classic view relates uncertainty in risk estimates for individuals to sample size (estimation uncertainty) but uncertainty can also be caused by model uncertainty (variability in modeling choices) and applicability uncertainty (variability in measurement procedures and between populations). Methods: We used real and synthetic data for ovarian cancer diagnosis to train 59400 models with variations in estimation, model, and applicability uncertainty. We then used these models to estimate the probability of ovarian cancer in a fixed test set of 100 patients and evaluate the variability in individual estimates. Findings: We show empirically that estimation uncertainty can be strongly dominated by model uncertainty and applicability uncertainty, even for models that perform well at the population level. Estimation uncertainty decreased considerably with increasing training sample size, whereas model and applicability uncertainty remained large. Interpretation: Individual risk estimates are far more uncertain than often assumed. Model uncertainty and applicability uncertainty usually remain invisible when prediction models or algorithms are based on a single study. Predictive algorithms should inform, not dictate, care and support personalization through clinician-patient interaction rather than through inherently uncertain model outputs. Funding: This research is supported by Research Foundation Flanders (FWO) grants G097322N, G049312N, G0B4716N, and 12F3114N to BVC and/or DTi, KU Leuven internal grants C24M/20/064 and C24/15/037 to BVC and/or DT, ZoNMW VIDI grant 09150172310023 to LW. DT is a senior clinical investigator of FWO.