π€ AI Summary
To address the insufficient out-of-distribution (OOD) detection performance of vision-language models (VLMs) in few-shot settings, this paper proposes a βcoarse-to-fineβ local prompt tuning paradigm: global pretrained prompts are frozen, while only scalable, region-specific local prompts are optimized. We introduce global-guided negative sample augmentation and local feature regularization to enhance discriminative capability for anomalous regions. Key contributions include: (i) the first local-prompt-driven framework for few-shot OOD detection; (ii) a region-relevance metric that quantifies localized abnormal responses; and (iii) plug-and-play compatibility with arbitrary pretrained global prompts. Evaluated on ImageNet-1K under a 4-shot setting, our method reduces average false positive rate at 95% true positive rate (FPR95) by 5.17%, outperforming state-of-the-art methods trained with 16 shots.
π Abstract
Out-of-Distribution (OOD) detection, aiming to distinguish outliers from known categories, has gained prominence in practical scenarios. Recently, the advent of vision-language models (VLM) has heightened interest in enhancing OOD detection for VLM through few-shot tuning. However, existing methods mainly focus on optimizing global prompts, ignoring refined utilization of local information with regard to outliers. Motivated by this, we freeze global prompts and introduce Local-Prompt, a novel coarse-to-fine tuning paradigm to emphasize regional enhancement with local prompts. Our method comprises two integral components: global prompt guided negative augmentation and local prompt enhanced regional regularization. The former utilizes frozen, coarse global prompts as guiding cues to incorporate negative augmentation, thereby leveraging local outlier knowledge. The latter employs trainable local prompts and a regional regularization to capture local information effectively, aiding in outlier identification. We also propose regional-related metric to empower the enrichment of OOD detection. Moreover, since our approach explores enhancing local prompts only, it can be seamlessly integrated with trained global prompts during inference to boost the performance. Comprehensive experiments demonstrate the effectiveness and potential of our method. Notably, our method reduces average FPR95 by 5.17% against state-of-the-art method in 4-shot tuning on challenging ImageNet-1k dataset, even outperforming 16-shot results of previous methods. Code is released at https://github.com/AuroraZengfh/Local-Prompt.