Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

📅 2024-07-31

🏛️ arXiv.org

📈 Citations: 12

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Amid the rise of vision-language models (VLMs/LVLMs), out-of-distribution (OOD) detection and related tasks—such as anomaly detection, novelty detection, open-set recognition, and outlier detection—suffer from conceptual ambiguity and paradigmatic fragmentation. This paper proposes “Generalized OOD Detection v2”, a unified framework that systematically clarifies semantic boundaries and evolutionary relationships among these tasks, identifies OOD detection and anomaly detection as the central challenges, and formalizes novel evaluation paradigms and problem settings introduced by LVLMs (e.g., GPT-4V). Leveraging CLIP-based semantic alignment analysis, task taxonomy modeling, benchmark evolution comparison, and cross-task method review, the work synthesizes over 100 studies. The resulting VLM-driven conceptual framework redefines foundational assumptions, pinpoints critical technical challenges—including semantic misalignment, evaluation inconsistency, and LVLM-specific failure modes—and charts concrete directions for future research, establishing itself as the definitive survey in this rapidly evolving domain.

Technology Category

Application Category

📝 Abstract

Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm and blurred the boundaries between these fields, again confusing researchers. In this survey, we first present a generalized OOD detection v2, encapsulating the evolution of AD, ND, OSR, OOD detection, and OD in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges have become OOD detection and AD. In addition, we also highlight the significant shift in the definition, problem settings, and benchmarks; we thus feature a comprehensive review of the methodology for OOD detection, including the discussion over other related tasks to clarify their relationship to OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, such as GPT-4V. We conclude this survey with open challenges and future directions.

Problem

Research questions and friction points this paper is trying to address.

Generalized OOD detection framework unifies related problems in VLMs

Clarifies OOD detection and anomaly detection challenges in VLM era

Reviews methodology and benchmarks for OOD detection in LVLM era

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes generalized OOD detection v2 framework

Reviews methodology for OOD and related tasks

Explores advancements in Large Vision Language Models

🔎 Similar Papers

No similar papers found.