OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Medical AI exhibits insufficient reliability on out-of-distribution (OOD) medical images, and existing OOD detection methods lack clinical relevance and systematic evaluation. To address this, we introduce the first open, clinically grounded OOD detection benchmark for medical imaging—covering three major clinical domains and 14 real-world datasets—stratified by covariate shift and proximity (near vs. far) of OOD samples, alongside a standardized evaluation protocol. Our study is the first to empirically demonstrate the widespread failure of general-purpose image OOD detection methods in medical settings. Through systematic evaluation of 24 post-hoc OOD detection approaches, we uncover significant performance gaps across clinical tasks and OOD types. The benchmark enables reproducible, extensible robustness validation, filling a critical gap in trustworthy evaluation of medical AI. It provides foundational infrastructure and empirical insights to support safe, clinically viable deployment of AI models in healthcare.

Technology Category

Application Category

📝 Abstract

The growing reliance on Artificial Intelligence (AI) in critical domains such as healthcare demands robust mechanisms to ensure the trustworthiness of these systems, especially when faced with unexpected or anomalous inputs. This paper introduces the Open Medical Imaging Benchmarks for Out-Of-Distribution Detection (OpenMIBOOD), a comprehensive framework for evaluating out-of-distribution (OOD) detection methods specifically in medical imaging contexts. OpenMIBOOD includes three benchmarks from diverse medical domains, encompassing 14 datasets divided into covariate-shifted in-distribution, near-OOD, and far-OOD categories. We evaluate 24 post-hoc methods across these benchmarks, providing a standardized reference to advance the development and fair comparison of OOD detection methods. Results reveal that findings from broad-scale OOD benchmarks in natural image domains do not translate to medical applications, underscoring the critical need for such benchmarks in the medical field. By mitigating the risk of exposing AI models to inputs outside their training distribution, OpenMIBOOD aims to support the advancement of reliable and trustworthy AI systems in healthcare. The repository is available at https://github.com/remic-othr/OpenMIBOOD.

Problem

Research questions and friction points this paper is trying to address.

Evaluates OOD detection in medical imaging

Introduces benchmarks for diverse medical datasets

Supports reliable AI systems in healthcare

Innovation

Methods, ideas, or system contributions that make the work stand out.

OpenMIBOOD framework for OOD detection

Three benchmarks with 14 datasets

Evaluation of 24 post-hoc methods

🔎 Similar Papers

MedIAnomaly: A comparative study of anomaly detection in medical images