🤖 AI Summary
Uncertainty quantification (UQ) remains fragmented and insufficiently integrated across medical AI systems, undermining model robustness, clinical trustworthiness, and regulatory compliance. To address this gap, we propose the first comprehensive UQ classification and adaptation framework spanning the full machine learning lifecycle in healthcare—encompassing data preprocessing, model training, and evaluation. We systematically survey empirical evidence for 12 mainstream UQ methods—including Bayesian deep learning, Monte Carlo dropout, ensemble estimation, temperature scaling, and out-of-distribution detection—across imaging, pathology, and electronic health record applications. We further assess the clinical applicability of emerging UQ techniques and identify key implementation barriers. Finally, we deliver a practical, stepwise UQ deployment roadmap. This work bridges methodological advances in UQ with real-world clinical deployment, providing both theoretical foundations and actionable guidance to enhance the reliability, interpretability, and trustworthy adoption of AI in healthcare.
📝 Abstract
Uncertainty Quantification (UQ) is pivotal in enhancing the robustness, reliability, and interpretability of Machine Learning (ML) systems for healthcare, optimizing resources and improving patient care. Despite the emergence of ML-based clinical decision support tools, the lack of principled quantification of uncertainty in ML models remains a major challenge. Current reviews have a narrow focus on analyzing the state-of-the-art UQ in specific healthcare domains without systematically evaluating method efficacy across different stages of model development, and despite a growing body of research, its implementation in healthcare applications remains limited. Therefore, in this survey, we provide a comprehensive analysis of current UQ in healthcare, offering an informed framework that highlights how different methods can be integrated into each stage of the ML pipeline including data processing, training and evaluation. We also highlight the most popular methods used in healthcare and novel approaches from other domains that hold potential for future adoption in the medical context. We expect this study will provide a clear overview of the challenges and opportunities of implementing UQ in the ML pipeline for healthcare, guiding researchers and practitioners in selecting suitable techniques to enhance the reliability, safety and trust from patients and clinicians on ML-driven healthcare solutions.