Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

📅 2024-07-31
🏛️ arXiv.org
📈 Citations: 12
Influential: 0
📄 PDF
🤖 AI Summary
Amid the rise of vision-language models (VLMs/LVLMs), out-of-distribution (OOD) detection and related tasks—such as anomaly detection, novelty detection, open-set recognition, and outlier detection—suffer from conceptual ambiguity and paradigmatic fragmentation. This paper proposes “Generalized OOD Detection v2”, a unified framework that systematically clarifies semantic boundaries and evolutionary relationships among these tasks, identifies OOD detection and anomaly detection as the central challenges, and formalizes novel evaluation paradigms and problem settings introduced by LVLMs (e.g., GPT-4V). Leveraging CLIP-based semantic alignment analysis, task taxonomy modeling, benchmark evolution comparison, and cross-task method review, the work synthesizes over 100 studies. The resulting VLM-driven conceptual framework redefines foundational assumptions, pinpoints critical technical challenges—including semantic misalignment, evaluation inconsistency, and LVLM-specific failure modes—and charts concrete directions for future research, establishing itself as the definitive survey in this rapidly evolving domain.

Technology Category

Application Category

📝 Abstract
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm and blurred the boundaries between these fields, again confusing researchers. In this survey, we first present a generalized OOD detection v2, encapsulating the evolution of AD, ND, OSR, OOD detection, and OD in the VLM era. Our framework reveals that, with some field inactivity and integration, the demanding challenges have become OOD detection and AD. In addition, we also highlight the significant shift in the definition, problem settings, and benchmarks; we thus feature a comprehensive review of the methodology for OOD detection, including the discussion over other related tasks to clarify their relationship to OOD detection. Finally, we explore the advancements in the emerging Large Vision Language Model (LVLM) era, such as GPT-4V. We conclude this survey with open challenges and future directions.
Problem

Research questions and friction points this paper is trying to address.

Generalized OOD detection framework unifies related problems in VLMs
Clarifies OOD detection and anomaly detection challenges in VLM era
Reviews methodology and benchmarks for OOD detection in LVLM era
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes generalized OOD detection v2 framework
Reviews methodology for OOD and related tasks
Explores advancements in Large Vision Language Models
🔎 Similar Papers
No similar papers found.
Atsuyuki Miyai
Atsuyuki Miyai
The University of Tokyo
computer visionrepresentation learningout-of-distribution detection
Jingkang Yang
Jingkang Yang
PhD, MMLab@NTU
Visual PerceptionVisual ReasoningMultimodalityOpen World
J
Jingyang Zhang
Department of Electrical and Computer Engineering, Duke University, Durham, NC, United States
Yifei Ming
Yifei Ming
Salesforce AI Research
Large Language ModelLLM AgentAI SafetyVision Language Model
Yueqian Lin
Yueqian Lin
PhD Student, Duke University
Q
Qing Yu
Department of Information and Communication Engineering, The University of Tokyo, Japan; LY Corporation, Japan
Go Irie
Go Irie
Tokyo University of Science
Pattern RecognitionMachine LearningMultimedia
S
Shafiq R. Joty
Salesforce AI Research, Palo Alto, CA, United States; NTU, Singapore (on leave)
Y
Yixuan Li
Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, United States
H
Hai Li
Department of Electrical and Computer Engineering, Duke University, Durham, NC, United States
Ziwei Liu
Ziwei Liu
Associate Professor, Nanyang Technological University
Computer VisionMachine LearningComputer Graphics
T
T. Yamasaki
Department of Information and Communication Engineering, The University of Tokyo, Japan
Kiyoharu Aizawa
Kiyoharu Aizawa
University of Tokyo
MultimediaComputer VisionFood ComputingMangaVirtual Exploration