🤖 AI Summary
Amid the rise of vision-language models (VLMs/LVLMs), out-of-distribution (OOD) detection and related tasks—such as anomaly detection, novelty detection, open-set recognition, and outlier detection—suffer from conceptual ambiguity and paradigmatic fragmentation. This paper proposes “Generalized OOD Detection v2”, a unified framework that systematically clarifies semantic boundaries and evolutionary relationships among these tasks, identifies OOD detection and anomaly detection as the central challenges, and formalizes novel evaluation paradigms and problem settings introduced by LVLMs (e.g., GPT-4V). Leveraging CLIP-based semantic alignment analysis, task taxonomy modeling, benchmark evolution comparison, and cross-task method review, the work synthesizes over 100 studies. The resulting VLM-driven conceptual framework redefines foundational assumptions, pinpoints critical technical challenges—including semantic misalignment, evaluation inconsistency, and LVLM-specific failure modes—and charts concrete directions for future research, establishing itself as the definitive survey in this rapidly evolving domain.
📝 Abstract
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm and blurred the boundaries between these fields, again confusing researchers. In this survey, we first present a generalized OOD detection v2, encapsulating the evolution of AD, ND, OSR, OOD detection, and OD in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges have become OOD detection and AD. In addition, we also highlight the significant shift in the definition, problem settings, and benchmarks; we thus feature a comprehensive review of the methodology for OOD detection, including the discussion over other related tasks to clarify their relationship to OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, such as GPT-4V. We conclude this survey with open challenges and future directions.