Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the instability of individual predictions from clinical risk prediction models across different training samples, a key limitation undermining their clinical reliability. To enhance prediction stability at the individual level without compromising interpretability, the authors propose a novel approach that directly integrates bootstrapping into the training process of deep neural networks. By imposing a regularization constraint on prediction variability across resampled datasets, the method encourages consistent outputs from a single model. Evaluated on benchmark clinical datasets—including GUSTO-I, Framingham, and SUPPORT—the approach significantly outperforms existing methods, achieving reduced mean absolute prediction differences while maintaining high discriminative performance and consistency in SHAP-based feature importance rankings.

Technology Category

Application Category

📝 Abstract
Clinical prediction models are increasingly used to support patient care, yet many deep learning-based approaches remain unstable, as their predictions can vary substantially when trained on different samples from the same population. Such instability undermines reliability and limits clinical adoption. In this study, we propose a novel bootstrapping-based regularisation framework that embeds the bootstrapping process directly into the training of deep neural networks. This approach constrains prediction variability across resampled datasets, producing a single model with inherent stability properties. We evaluated models constructed using the proposed regularisation approach against conventional and ensemble models using simulated data and three clinical datasets: GUSTO-I, Framingham, and SUPPORT. Across all datasets, our model exhibited improved prediction stability, with lower mean absolute differences (e.g., 0.019 vs. 0.059 in GUSTO-I; 0.057 vs. 0.088 in Framingham) and markedly fewer significantly deviating predictions. Importantly, discriminative performance and feature importance consistency were maintained, with high SHAP correlations between models (e.g., 0.894 for GUSTO-I; 0.965 for Framingham). While ensemble models achieved greater stability, we show that this came at the expense of interpretability, as each constituent model used predictors in different ways. By regularising predictions to align with bootstrapped distributions, our approach allows prediction models to be developed that achieve greater robustness and reproducibility without sacrificing interpretability. This method provides a practical route toward more reliable and clinically trustworthy deep learning models, particularly valuable in data-limited healthcare settings.
Problem

Research questions and friction points this paper is trying to address.

prediction instability
clinical risk prediction
deep learning
model reliability
bootstrapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

bootstrapping-based regularisation
prediction stability
deep learning
clinical risk prediction
model interpretability
🔎 Similar Papers
No similar papers found.
S
Sara Matijevic
Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK
Christopher Yau
Christopher Yau
University of Oxford
StatisticsHealthGenomicsMachine LearningArtificial Intelligence