🤖 AI Summary
Ensuring reliable out-of-distribution (OOD) detection in fetal ultrasound imaging is critical for the safe clinical deployment of deep learning models under heterogeneous real-world conditions. Method: We systematically evaluate eight uncertainty quantification methods across four representative classification tasks (e.g., gestational age staging, anatomical structure identification) and multiple in-distribution–OOD splits reflecting distinct distribution shift patterns—including image quality degradation, anatomical variation, and inter-device discrepancies. Contribution/Results: We find that OOD detection performance is highly contingent on the semantic alignment between task objectives and shift types; no single task universally dominates across shifts. Crucially, we demonstrate—for the first time in medical AI—that high OOD detection accuracy does not guarantee optimal rejection decisions under clinical constraints. Our work establishes a task–shift–uncertainty co-design paradigm, enabling context-aware, robust deployment of AI systems in dynamic clinical environments.
📝 Abstract
Reliable out-of-distribution (OOD) detection is important for safe deployment of deep learning models in fetal ultrasound amidst heterogeneous image characteristics and clinical settings. OOD detection relies on estimating a classification model's uncertainty, which should increase for OOD samples. While existing research has largely focused on uncertainty quantification methods, this work investigates the impact of the classification task itself. Through experiments with eight uncertainty quantification methods across four classification tasks, we demonstrate that OOD detection performance significantly varies with the task, and that the best task depends on the defined ID-OOD criteria; specifically, whether the OOD sample is due to: i) an image characteristic shift or ii) an anatomical feature shift. Furthermore, we reveal that superior OOD detection does not guarantee optimal abstained prediction, underscoring the necessity to align task selection and uncertainty strategies with the specific downstream application in medical image analysis.