🤖 AI Summary
In the zero-shot CLIP era, out-of-distribution (OOD) detection suffers from degraded performance when unknown classes emerge, as conventional methods rely solely on either closed-world supervised models or pure zero-shot CLIP—both ill-suited for OOD discrimination. Method: We propose a heterogeneous ensemble framework that jointly integrates a closed-world supervised classifier, a zero-shot CLIP classifier, and a linear probe trained on CLIP features. Crucially, our approach is the first to systematically unify supervised and zero-shot paradigms via a lightweight, post-hoc ensemble strategy—requiring no additional large-scale training. Contribution/Results: Evaluated on CIFAR-100 and ImageNet under challenging settings—including label noise and covariate shift—our method achieves state-of-the-art OOD detection performance, significantly outperforming single-model and CLIP-only baselines. It demonstrates superior robustness, computational efficiency, and generalizability across diverse distributional shifts.
📝 Abstract
Out-of-distribution (OOD) detection is an important building block in trustworthy image recognition systems as unknown classes may arise at test-time. OOD detection methods typically revolve around a single classifier, leading to a split in the research field between the classical supervised setting (e.g. ResNet18 classifier trained on CIFAR100) vs. the zero-shot setting (class names fed as prompts to CLIP). In both cases, an overarching challenge is that the OOD detection performance is implicitly constrained by the classifier's capabilities on in-distribution (ID) data. In this work, we show that given a little open-mindedness from both ends, remarkable OOD detection can be achieved by instead creating a heterogeneous ensemble - COOkeD combines the predictions of a closed-world classifier trained end-to-end on a specific dataset, a zero-shot CLIP classifier, and a linear probe classifier trained on CLIP image features. While bulky at first sight, this approach is modular, post-hoc and leverages the availability of pre-trained VLMs, thus introduces little overhead compared to training a single standard classifier. We evaluate COOkeD on popular CIFAR100 and ImageNet benchmarks, but also consider more challenging, realistic settings ranging from training-time label noise, to test-time covariate shift, to zero-shot shift which has been previously overlooked. Despite its simplicity, COOkeD achieves state-of-the-art performance and greater robustness compared to both classical and CLIP-based OOD detection methods. Code is available at https://github.com/glhr/COOkeD