OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical AI exhibits insufficient reliability on out-of-distribution (OOD) medical images, and existing OOD detection methods lack clinical relevance and systematic evaluation. To address this, we introduce the first open, clinically grounded OOD detection benchmark for medical imaging—covering three major clinical domains and 14 real-world datasets—stratified by covariate shift and proximity (near vs. far) of OOD samples, alongside a standardized evaluation protocol. Our study is the first to empirically demonstrate the widespread failure of general-purpose image OOD detection methods in medical settings. Through systematic evaluation of 24 post-hoc OOD detection approaches, we uncover significant performance gaps across clinical tasks and OOD types. The benchmark enables reproducible, extensible robustness validation, filling a critical gap in trustworthy evaluation of medical AI. It provides foundational infrastructure and empirical insights to support safe, clinically viable deployment of AI models in healthcare.

Technology Category

Application Category

📝 Abstract
The growing reliance on Artificial Intelligence (AI) in critical domains such as healthcare demands robust mechanisms to ensure the trustworthiness of these systems, especially when faced with unexpected or anomalous inputs. This paper introduces the Open Medical Imaging Benchmarks for Out-Of-Distribution Detection (OpenMIBOOD), a comprehensive framework for evaluating out-of-distribution (OOD) detection methods specifically in medical imaging contexts. OpenMIBOOD includes three benchmarks from diverse medical domains, encompassing 14 datasets divided into covariate-shifted in-distribution, near-OOD, and far-OOD categories. We evaluate 24 post-hoc methods across these benchmarks, providing a standardized reference to advance the development and fair comparison of OOD detection methods. Results reveal that findings from broad-scale OOD benchmarks in natural image domains do not translate to medical applications, underscoring the critical need for such benchmarks in the medical field. By mitigating the risk of exposing AI models to inputs outside their training distribution, OpenMIBOOD aims to support the advancement of reliable and trustworthy AI systems in healthcare. The repository is available at https://github.com/remic-othr/OpenMIBOOD.
Problem

Research questions and friction points this paper is trying to address.

Evaluates OOD detection in medical imaging
Introduces benchmarks for diverse medical datasets
Supports reliable AI systems in healthcare
Innovation

Methods, ideas, or system contributions that make the work stand out.

OpenMIBOOD framework for OOD detection
Three benchmarks with 14 datasets
Evaluation of 24 post-hoc methods
🔎 Similar Papers
No similar papers found.
M
Max Gutbrod
Regensburg Medical Image Computing (ReMIC), OTH Regensburg, Regensburg, 93053, Germany; Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Regensburg, 93053, Germany
David Rauber
David Rauber
PhD Student, Ostbayerische Technische Hochschule
machine learningmedical image computing
D
D. W. Nunes
Regensburg Medical Image Computing (ReMIC), OTH Regensburg, Regensburg, 93053, Germany; Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Regensburg, 93053, Germany
Christoph Palm
Christoph Palm
Professor (Full), Ostbayerische Technische Hochschule Regensburg, ReMIC
medical image computingAImachine learningimage segmentation and classificationcomputer vision