🤖 AI Summary
To address the challenge of unreliable out-of-distribution (OOD) detection by pretrained models in AI-based medical imaging systems, this paper proposes a plug-and-play post-hoc normalization flow framework based on RealNVP. It estimates likelihoods solely from semantic features extracted by the frozen pretrained model—requiring no fine-tuning or architectural modification. This work introduces the first semantic-driven OOD detection paradigm tailored for medical imaging, eliminating reliance on pixel-level statistics while ensuring clinical deployability and representation robustness. Evaluated on MedMNIST and the newly introduced MedOOD benchmark, the method achieves AUROC scores of 93.80% and 84.61%, respectively—significantly surpassing ten state-of-the-art baselines. To foster reproducibility and further research, the implementation code and the MedOOD dataset—containing curated domain-shifted medical image subsets—are publicly released.
📝 Abstract
Out-of-distribution (OOD) detection is crucial in AI-driven medical imaging to ensure reliability and safety by identifying inputs outside a model's training distribution. Existing methods often require retraining or modifications to pre-trained models, which is impractical for clinical applications. This study introduces a post-hoc normalizing flow-based approach that seamlessly integrates with pre-trained models. By leveraging normalizing flows, it estimates the likelihood of feature vectors extracted from pre-trained models, capturing semantically meaningful representations without relying on pixel-level statistics. The method was evaluated using the MedMNIST benchmark and a newly curated MedOOD dataset simulating clinically relevant distributional shifts. Performance was measured using standard OOD detection metrics (e.g., AUROC, FPR@95, AUPR_IN, AUPR_OUT), with statistical analyses comparing it against ten baseline methods. On MedMNIST, the proposed model achieved an AUROC of 93.80%, outperforming state-of-the-art methods. On MedOOD, it achieved an AUROC of 84.61%, demonstrating superior performance against other methods. Its post-hoc nature ensures compatibility with existing clinical workflows, addressing the limitations of previous approaches. The model and code to build OOD datasets are available at https://github.com/dlotfi/MedOODFlow.