🤖 AI Summary
In clinical prediction model (CPM) development, conventional fixed sample size calculations—based on a priori assumptions—are prone to underestimation due to assumption violations, jeopardizing model stability and individual-level prediction reliability. To address this, we propose a sequential sample size determination method grounded in learning curve analysis. Our approach uniquely employs individual prediction uncertainty and classification instability as dynamic stopping criteria, integrated with optimism-corrected calibration and discrimination assessment for real-time monitoring of overfitting, calibration, and discriminative performance. The method combines logistic regression, bootstrap resampling, and sequential learning curve evaluation. Validated in acute kidney injury prediction modeling, it revealed that while conventional methods recommended 342 cases, our approach required 1,100–1,800 cases to simultaneously ensure robust population-level performance and reliable individual predictions—substantially enhancing the scientific rigor and reproducibility of CPM development.
📝 Abstract
When prospectively developing a new clinical prediction model (CPM), fixed sample size calculations are typically conducted before data collection based on sensible assumptions. But if the assumptions are inaccurate the actual sample size required to develop a reliable model may be very different. To safeguard against this, adaptive sample size approaches have been proposed, based on sequential evaluation of a models predictive performance. Aim: illustrate and extend sequential sample size calculations for CPM development by (i) proposing stopping rules based on minimising uncertainty (instability) and misclassification of individual-level predictions, and (ii) showcasing how it safeguards against inaccurate fixed sample size calculations. Using the sequential approach repeats the pre-defined model development strategy every time a chosen number (e.g., 100) of participants are recruited and adequately followed up. At each stage, CPM performance is evaluated using bootstrapping, leading to prediction and classification stability statistics and plots, alongside optimism-adjusted measures of calibration and discrimination. Our approach is illustrated for development of acute kidney injury using logistic regression CPMs. The fixed sample size calculation, based on perceived sensible assumptions suggests recruiting 342 patients to minimise overfitting; however, the sequential approach reveals that a much larger sample size of 1100 is required to minimise overfitting (targeting population-level stability). If the stopping rule criteria also target small uncertainty and misclassification probability of individual predictions, the sequential approach suggests an even larger sample size (n=1800). Our sequential sample size approach allows users to dynamically monitor individual-level prediction and classification instability and safeguard against using inaccurate assumptions.