COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

In the zero-shot CLIP era, out-of-distribution (OOD) detection suffers from degraded performance when unknown classes emerge, as conventional methods rely solely on either closed-world supervised models or pure zero-shot CLIP—both ill-suited for OOD discrimination. Method: We propose a heterogeneous ensemble framework that jointly integrates a closed-world supervised classifier, a zero-shot CLIP classifier, and a linear probe trained on CLIP features. Crucially, our approach is the first to systematically unify supervised and zero-shot paradigms via a lightweight, post-hoc ensemble strategy—requiring no additional large-scale training. Contribution/Results: Evaluated on CIFAR-100 and ImageNet under challenging settings—including label noise and covariate shift—our method achieves state-of-the-art OOD detection performance, significantly outperforming single-model and CLIP-only baselines. It demonstrates superior robustness, computational efficiency, and generalizability across diverse distributional shifts.

Technology Category

Application Category

📝 Abstract

Out-of-distribution (OOD) detection is an important building block in trustworthy image recognition systems as unknown classes may arise at test-time. OOD detection methods typically revolve around a single classifier, leading to a split in the research field between the classical supervised setting (e.g. ResNet18 classifier trained on CIFAR100) vs. the zero-shot setting (class names fed as prompts to CLIP). In both cases, an overarching challenge is that the OOD detection performance is implicitly constrained by the classifier's capabilities on in-distribution (ID) data. In this work, we show that given a little open-mindedness from both ends, remarkable OOD detection can be achieved by instead creating a heterogeneous ensemble - COOkeD combines the predictions of a closed-world classifier trained end-to-end on a specific dataset, a zero-shot CLIP classifier, and a linear probe classifier trained on CLIP image features. While bulky at first sight, this approach is modular, post-hoc and leverages the availability of pre-trained VLMs, thus introduces little overhead compared to training a single standard classifier. We evaluate COOkeD on popular CIFAR100 and ImageNet benchmarks, but also consider more challenging, realistic settings ranging from training-time label noise, to test-time covariate shift, to zero-shot shift which has been previously overlooked. Despite its simplicity, COOkeD achieves state-of-the-art performance and greater robustness compared to both classical and CLIP-based OOD detection methods. Code is available at https://github.com/glhr/COOkeD

Problem

Research questions and friction points this paper is trying to address.

Detecting unknown classes in trustworthy image recognition systems

Overcoming OOD detection constraints from single classifier limitations

Enhancing robustness against label noise and covariate shifts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous ensemble combines multiple classifiers

Leverages pre-trained VLMs for modular post-hoc approach

Achieves state-of-the-art OOD detection robustness

🔎 Similar Papers

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey