๐ค AI Summary
To address reliability and interpretability challenges in out-of-distribution (OOD) detection for medical image diagnosis, this paper proposes a neuron-level saliency-driven OOD detection framework. Methodologically, it introducesโ for the first timeโa novel OOD scoring mechanism grounded in pairwise neuronal response correlations; constructs class-aware saliency clustering centers with an interpretable distance metric; and jointly models scaled saliency bias and feature norm to enhance discriminative capability. Evaluated on the Kvasir and GastroVision gastrointestinal endoscopy datasets, our method surpasses existing state-of-the-art approaches in OOD detection accuracy while providing instance- and neuron-level interpretability. It thus establishes a new paradigm for robust, trustworthy, and clinically deployable OOD identification in AI-assisted diagnostic systems.
๐ Abstract
Ensuring reliability is paramount in deep learning, particularly within the domain of medical imaging, where diagnostic decisions often hinge on model outputs. The capacity to separate out-of-distribution (OOD) samples has proven to be a valuable indicator of a model's reliability in research. In medical imaging, this is especially critical, as identifying OOD inputs can help flag potential anomalies that might otherwise go undetected. While many OOD detection methods rely on feature or logit space representations, recent works suggest these approaches may not fully capture OOD diversity. To address this, we propose a novel OOD scoring mechanism, called NERO, that leverages neuron-level relevance at the feature layer. Specifically, we cluster neuron-level relevance for each in-distribution (ID) class to form representative centroids and introduce a relevance distance metric to quantify a new sample's deviation from these centroids, enhancing OOD separability. Additionally, we refine performance by incorporating scaled relevance in the bias term and combining feature norms. Our framework also enables explainable OOD detection. We validate its effectiveness across multiple deep learning architectures on the gastrointestinal imaging benchmarks Kvasir and GastroVision, achieving improvements over state-of-the-art OOD detection methods.