🤖 AI Summary
This work addresses few-shot anomaly detection in industrial scenarios by proposing AnomalyDINO—a training-free, purely vision-based method. It pioneers the integration of DINOv2 visual features with the Deep k-Nearest Neighbors (k-NN) paradigm, eliminating the need for fine-tuning, meta-learning, or multimodal (e.g., language) supervision, while simultaneously enabling image-level classification and pixel-level localization. Leveraging patch-level feature matching, unsupervised anomaly scoring, and multi-scale feature fusion, AnomalyDINO achieves 96.6% image-level AUROC on MVTec-AD under the single-shot setting—substantially surpassing prior state-of-the-art methods. Its lightweight, architecture-free design facilitates rapid deployment, and extensive validation across real-world industrial defect detection tasks confirms its practical efficacy. Crucially, this work demonstrates that high-quality self-supervised visual representations alone can match or exceed the performance of vision-language models in few-shot anomaly detection.
📝 Abstract
Recent advances in multimodal foundation models have set new standards in few-shot anomaly detection. This paper explores whether high-quality visual features alone are sufficient to rival existing state-of-the-art vision-language models. We affirm this by adapting DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. We show that this approach does not only rival existing techniques but can even outmatch them in many settings. Our proposed vision-only approach, AnomalyDINO, follows the well-established patch-level deep nearest neighbor paradigm, and enables both image-level anomaly prediction and pixel-level anomaly segmentation. The approach is methodologically simple and training-free and, thus, does not require any additional data for fine-tuning or meta-learning. The approach is methodologically simple and training-free and, thus, does not require any additional data for fine-tuning or meta-learning. Despite its simplicity, AnomalyDINO achieves state-of-the-art results in one- and few-shot anomaly detection (e.g., pushing the one-shot performance on MVTec-AD from an AUROC of 93.1% to 96.6%). The reduced overhead, coupled with its outstanding few-shot performance, makes AnomalyDINO a strong candidate for fast deployment, e.g., in industrial contexts.