🤖 AI Summary
This study addresses the risks posed by updating clinical AI models—namely, increased prediction instability, heightened arbitrariness, and diminished fairness across subpopulations—which can undermine decision reliability. Focusing on predicting severe hyperglycemic events in pediatric type 1 diabetes, the work presents the first systematic evaluation of how different model update strategies affect stability, arbitrariness, and fairness. Leveraging four public datasets encompassing 496 patients and approximately 11,300 weeks of continuous glucose monitoring data, augmented with sociodemographic variables, the authors develop a multidimensional continuous monitoring framework to quantify predictive consistency and error equity. Their findings reveal that model updates can induce substantial prediction reversals and exacerbate error imbalances among subgroups, underscoring the critical need for dynamic monitoring and offering methodological guidance for the responsible deployment of clinical AI systems.
📝 Abstract
Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to changes in demographics, environment, or patient behaviors, model performance can degrade substantially. While updating models with new training data is necessary, such updates may also introduce new risks. We evaluated the proposed monitoring framework on four publicly available U.S.-based Type 1 Diabetes datasets containing high-resolution continuous glucose monitoring (CGM) data, comprising approximately 11,300 weekly observations from 496 participants under 20 years of age. All datasets included structured sociodemographic information. Using the prediction of severe hyperglycemia events in children with type 1 diabetes as a case study, we examine how different model update strategies can adversely affect model stability (e.g., by causing predictions to "flip" for a large number of cases after an update), increase arbitrariness in predictions, or worsen accuracy equity and the balance of error rates across subpopulations. We propose multiple dimensions for continuous monitoring to detect these issues and argue that such monitoring is essential for the development of trustworthy clinical decision support systems.