Out-of-Distribution Detection with Positive and Negative Prompt Supervision Using Large Language Models

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models suffer from overly generalized negative prompts that introduce intra-class semantic overlap or misleading cues, thereby degrading out-of-distribution (OOD) detection performance. To address this, we propose a positive–negative prompt supervision framework: leveraging large language models to generate category-relevant initial prompts, followed by a prompt optimization strategy that steers negative prompts to emphasize inter-class boundary features rather than broad non-in-distribution (non-ID) information; additionally, we construct a semantic graph structure to precisely propagate the optimized textual supervision to the visual branch. This enhances the multimodal discriminative capability of energy-based OOD detectors. Extensive experiments across CIFAR-100 and ImageNet-1K benchmarks—covering eight OOD datasets and five large language models—demonstrate that our method significantly outperforms existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Out-of-distribution (OOD) detection is committed to delineating the classification boundaries between in-distribution (ID) and OOD images. Recent advances in vision-language models (VLMs) have demonstrated remarkable OOD detection performance by integrating both visual and textual modalities. In this context, negative prompts are introduced to emphasize the dissimilarity between image features and prompt content. However, these prompts often include a broad range of non-ID features, which may result in suboptimal outcomes due to the capture of overlapping or misleading information. To address this issue, we propose Positive and Negative Prompt Supervision, which encourages negative prompts to capture inter-class features and transfers this semantic knowledge to the visual modality to enhance OOD detection performance. Our method begins with class-specific positive and negative prompts initialized by large language models (LLMs). These prompts are subsequently optimized, with positive prompts focusing on features within each class, while negative prompts highlight features around category boundaries. Additionally, a graph-based architecture is employed to aggregate semantic-aware supervision from the optimized prompt representations and propagate it to the visual branch, thereby enhancing the performance of the energy-based OOD detector. Extensive experiments on two benchmarks, CIFAR-100 and ImageNet-1K, across eight OOD datasets and five different LLMs, demonstrate that our method outperforms state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Enhancing OOD detection by optimizing positive and negative prompt supervision
Addressing suboptimal performance from overlapping features in negative prompts
Transferring semantic knowledge from text to visual modality for boundary delineation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Positive and Negative Prompt Supervision for OOD detection
LLM-initialized prompts optimized for intra-class and boundary features
Graph-based architecture propagates semantic supervision to visual branch
Z
Zhixia He
School of New Media and Communication, Tianjin University, Tianjin, China
C
Chen Zhao
Department of Computer Science, Baylor University, Waco, Texas, USA
Minglai Shao
Minglai Shao
Tianjin University
Graph MiningDeep LearningMachine Learning
Xintao Wu
Xintao Wu
University of Arkansas
Data MiningPrivacy and SecurityTrustworthy AIAI4Science
Xujiang Zhao
Xujiang Zhao
Researcher at NEC Laboratories America
AI SafetyTrustworthy AIUncertaintyLLMReinforcement Learning
D
Dong Li
Department of Computer Science, Baylor University, Waco, Texas, USA
Q
Qin Tian
College of Intelligence and Computing, Tianjin University, Tianjin, China
Linlin Yu
Linlin Yu
University of Texas at Dallas
Uncertainty EstimationTrustworthy AIGraph Neural NetworkNLP