🤖 AI Summary
In open-world settings where ground-truth labels are unavailable, automatically selecting the optimal out-of-distribution (OOD) detection model remains an unsolved challenge. Method: This paper proposes the first zero-shot, unsupervised meta-learning framework for OOD model selection. It innovatively leverages large language models to generate task-level OOD characteristic embeddings, enabling cross-dataset task similarity modeling; it then integrates meta-learning with nonparametric statistical significance testing (Wilcoxon signed-rank test, *p* < 0.01) to perform label-free model ranking. Results: Evaluated across 24 dataset pairs and 11 OOD detectors, our method consistently outperforms all baselines by significant margins while incurring negligible inference overhead. It delivers a highly reliable, low-dependency solution for adaptive OOD model selection—critical for safety-sensitive real-time applications including online transaction monitoring, autonomous driving, and clinical decision support.
📝 Abstract
How can we automatically select an out-of-distribution (OOD) detection model for various underlying tasks? This is crucial for maintaining the reliability of open-world applications by identifying data distribution shifts, particularly in critical domains such as online transactions, autonomous driving, and real-time patient diagnosis. Despite the availability of numerous OOD detection methods, the challenge of selecting an optimal model for diverse tasks remains largely underexplored, especially in scenarios lacking ground truth labels. In this work, we introduce MetaOOD, the first zero-shot, unsupervised framework that utilizes meta-learning to select an OOD detection model automatically. As a meta-learning approach, MetaOOD leverages historical performance data of existing methods across various benchmark OOD detection datasets, enabling the effective selection of a suitable model for new datasets without the need for labeled data at the test time. To quantify task similarities more accurately, we introduce language model-based embeddings that capture the distinctive OOD characteristics of both datasets and detection models. Through extensive experimentation with 24 unique test dataset pairs to choose from among 11 OOD detection models, we demonstrate that MetaOOD significantly outperforms existing methods and only brings marginal time overhead. Our results, validated by Wilcoxon statistical tests, show that MetaOOD surpasses a diverse group of 11 baselines, including established OOD detectors and advanced unsupervised selection methods.