Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Industrial safety inspection faces challenges in few-shot, class-agnostic anomaly detection, where distinguishing normal from anomalous features is inherently difficult due to subtle and diverse anomalies. Method: We propose a lightweight embedding-difference-driven approach that exploits the strong correlation between anomaly severity and local discrepancies in the embedding space of pretrained vision encoders (e.g., DINOv3). Instead of introducing complex architectures or requiring auxiliary supervision, we design a learnable nonlinear projection operator to explicitly unlock the implicit anomaly discriminability embedded in these representations. Our method models the natural image distribution on the embedding manifold using only a few normal samples and localizes out-of-distribution anomalies via difference heatmaps. Results: The approach achieves state-of-the-art performance across multiple industrial anomaly detection benchmarks, reduces model parameters by an order of magnitude, exhibits strong cross-category generalization, and demonstrates consistent effectiveness across diverse foundation encoders.

Technology Category

Application Category

📝 Abstract

Few-shot anomaly detection streamlines and simplifies industrial safety inspection. However, limited samples make accurate differentiation between normal and abnormal features challenging, and even more so under category-agnostic conditions. Large-scale pre-training of foundation visual encoders has advanced many fields, as the enormous quantity of data helps to learn the general distribution of normal images. We observe that the anomaly amount in an image directly correlates with the difference in the learnt embeddings and utilize this to design a few-shot anomaly detector termed FoundAD. This is done by learning a nonlinear projection operator onto the natural image manifold. The simple operator acts as an effective tool for anomaly detection to characterize and identify out-of-distribution regions in an image. Extensive experiments show that our approach supports multi-class detection and achieves competitive performance while using substantially fewer parameters than prior methods. Backed up by evaluations with multiple foundation encoders, including fresh DINOv3, we believe this idea broadens the perspective on foundation features and advances the field of few-shot anomaly detection.

Problem

Research questions and friction points this paper is trying to address.

Developing few-shot anomaly detection for industrial safety inspection

Addressing category-agnostic anomaly detection with limited samples

Identifying out-of-distribution regions using foundation visual encoders

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning nonlinear projection onto image manifold

Utilizing foundation encoder embedding differences

Multi-class detection with fewer parameters

🔎 Similar Papers

AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2