PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection

📅 2024-10-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the insufficient robustness of Vision Transformers (ViTs) in out-of-distribution (OOD) detection, this paper proposes Prior-enhanced ViT (PViT). PViT leverages class logits from a pretrained model as transferable priors to guide ViT’s predictions and quantitatively measure consistency between ViT outputs and priors—using metrics such as KL divergence or cosine distance—enabling unsupervised OOD identification. The method requires no additional data modeling, generative components, or architectural modifications to the ViT backbone. It introduces the novel “prior-guided confidence” mechanism, which adaptively calibrates the in-distribution (ID)/OOD decision boundary. Evaluated on ImageNet and seven standard OOD benchmarks, PViT achieves significant improvements over state-of-the-art methods: up to 12.3% reduction in FPR95 and up to 4.1% improvement in AUROC.

Technology Category

Application Category

📝 Abstract

Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel and generic framework named Prior-augmented Vision Transformer (PViT). Taking as input the prior class logits from a pretrained model, we train PViT to predict the class logits. During inference, PViT identifies OOD samples by quantifying the divergence between the predicted class logits and the prior logits obtained from pre-trained models. Unlike existing state-of-the-art(SOTA) OOD detection methods, PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence, without requiring additional data modeling, generation methods, or structural modifications. Extensive experiments on the large-scale ImageNet benchmark, evaluated against over seven OOD datasets, demonstrate that PViT significantly outperforms existing SOTA OOD detection methods in terms of FPR95 and AUROC. The codebase is publicly available at https://github.com/RanchoGoose/PViT.

Problem

Research questions and friction points this paper is trying to address.

Visual Transformers (ViT)

Stability Improvement

Out-of-Distribution (OOD) Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

PViT

Pre-trained Model Prior

Anomaly Distribution Detection

🔎 Similar Papers

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens