From Benchmarks to Reality: Advancing Visual Anomaly Detection by the VAND 3.0 Challenge

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses two core challenges in real-world visual anomaly detection: robustness to distributional shifts and few-shot generalization. Leveraging the VAND 3.0 challenge, we propose a novel evaluation framework that jointly emphasizes out-of-distribution robustness and few-shot adaptability. We systematically investigate large-scale pre-trained vision and vision-language models (e.g., ViT, CLIP), introducing a backbone-driven pipeline for feature fusion and few-shot adaptation—incorporating optimized fine-tuning strategies and cross-modal feature alignment. Experiments demonstrate substantial improvements over baselines on both tasks, validating the critical contribution of advanced backbone architectures. Furthermore, our analysis uncovers a fundamental trade-off between computational efficiency and real-time deployability, highlighting bottlenecks in current approaches. The work thus provides new insights, practical design principles, and a rigorous benchmark for lightweight, robust, and production-ready anomaly detection systems.

Technology Category

Application Category

📝 Abstract
Visual anomaly detection is a strongly application-driven field of research. Consequently, the connection between academia and industry is of paramount importance. In this regard, we present the VAND 3.0 Challenge to showcase current progress in anomaly detection across different practical settings whilst addressing critical issues in the field. The challenge hosted two tracks, fostering the development of anomaly detection methods robust against real-world distribution shifts (Category 1) and exploring the capabilities of Vision Language Models within the few-shot regime (Category 2), respectively. The participants' solutions reached significant improvements over previous baselines by combining or adapting existing approaches and fusing them with novel pipelines. While for both tracks the progress in large pre-trained vision (language) backbones played a pivotal role for the performance increase, scaling up anomaly detection methods more efficiently needs to be addressed by future research to meet real-time and computational constraints on-site.
Problem

Research questions and friction points this paper is trying to address.

Advancing visual anomaly detection for real-world industrial applications
Developing methods robust against real-world distribution shifts
Exploring Vision Language Models capabilities in few-shot anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust anomaly detection against distribution shifts
Few-shot anomaly detection using Vision Language Models
Leveraging large pre-trained vision language backbones
🔎 Similar Papers
No similar papers found.
L
Lars Heckler-Kram
MVTec Software GmbH
A
Ashwin Vaidya
Intel
J
Jan-Hendrik Neudeck
MVTec Software GmbH
U
Ulla Scheler
MVTec Software GmbH
D
Dick Ameln
Intel
Samet Akcay
Samet Akcay
AI Research Engineer at Intel
Computer VisionMachine LearningAnomaly Detection
P
Paula Ramos
Voxel51