🤖 AI Summary
Deep neural networks often yield overconfident predictions on out-of-distribution (OOD) inputs in image classification, and existing representation-learning-based OOD detection methods—though effective—suffer from high training overhead and deployment costs. To address this, we propose MoLAR, a zero-training, lightweight OOD detection method: leveraging a frozen vision transformer backbone, it performs sample-level similarity comparison via cosine-similarity-weighted mixing of a small set of class-wise representative exemplars. MoLAR introduces the novel “Mixture of Exemplars” paradigm, requiring no fine-tuning or auxiliary training—enabling plug-and-play deployment. It consistently surpasses state-of-the-art methods under both supervised and semi-supervised settings, achieves up to 30× faster inference, and in certain scenarios even outperforms baselines trained on full in-distribution datasets. We open-source our implementation and empirically validate strong cross-dataset generalization.
📝 Abstract
One of the early weaknesses identified in deep neural networks trained for image classification tasks was their inability to provide low confidence predictions on out-of-distribution (OOD) data that was significantly different from the in-distribution (ID) data used to train them. Representation learning, where neural networks are trained in specific ways that improve their ability to detect OOD examples, has emerged as a promising solution. However, these approaches require long training times and can add additional overhead to detect OOD examples. Recent developments in Vision Transformer (ViT) foundation models$unicode{x2013}$large networks trained on large and diverse datasets with self-supervised approaches$unicode{x2013}$also show strong performance in OOD detection, and could address these challenges. This paper presents Mixture of Exemplars (MoLAR), an efficient approach to tackling OOD detection challenges that is designed to maximise the benefit of training a classifier with a high quality, frozen, pretrained foundation model backbone. MoLAR provides strong OOD performance when only comparing the similarity of OOD examples to the exemplars, a small set of images chosen to be representative of the dataset, leading to up to 30 times faster OOD detection inference over other methods that provide best performance when the full ID dataset is used. In some cases, only using these exemplars actually improves performance with MoLAR. Extensive experiments demonstrate the improved OOD detection performance of MoLAR in comparison to comparable approaches in both supervised and semi-supervised settings, and code is available at github.com/emannix/molar-mixture-of-exemplars.