🤖 AI Summary
Medical AI models for imaging often suffer from biased encoding and data drift, undermining clinical reliability; moreover, post-deployment performance degradation remains difficult to assess without ground-truth labels. This paper systematically reviews bias assessment and unsupervised data drift detection methods, and—novelty notwithstanding—unifies three key technical strands: mechanistic attribution of bias origins, multi-granularity statistical testing for distributional shift, and uncertainty-aware, pseudo-label-driven unsupervised accuracy estimation. Based on this synthesis, we propose a lifecycle reliability assessment framework spanning development-phase trustworthiness validation and deployment-phase continuous monitoring. We further introduce a structured taxonomy of methodological approaches and an auditable best-practice guideline, enabling robust, interpretable, and sustainable clinical deployment of medical AI compliant with FDA and CE regulatory requirements.
📝 Abstract
Machine Learning (ML) models have gained popularity in medical imaging analysis given their expert level performance in many medical domains. To enhance the trustworthiness, acceptance, and regulatory compliance of medical imaging models and to facilitate their integration into clinical settings, we review and categorise methods for ensuring ML reliability, both during development and throughout the model's lifespan. Specifically, we provide an overview of methods assessing models' inner-workings regarding bias encoding and detection of data drift for disease classification models. Additionally, to evaluate the severity in case of a significant drift, we provide an overview of the methods developed for classifier accuracy estimation in case of no access to ground truth labels. This should enable practitioners to implement methods ensuring reliable ML deployment and consistent prediction performance over time.