🤖 AI Summary
Downsampling in industrial anomaly detection often causes missed detection of minute defects—particularly sub-pixel ones—due to resolution loss. Method: This paper introduces the first high-resolution anomaly segmentation benchmark to systematically evaluate model robustness in localizing multi-scale defects. We propose a forward-backward feature transfer mechanism and design a lightweight unsupervised framework comprising a frozen ViT-based teacher and a dual shallow-MLP student, enabling bidirectional patch-level feature distillation across layers. Contribution/Results: We formally define “defect-size robustness” as a novel quantitative metric. Our method achieves state-of-the-art segmentation performance on both MVTec AD and VisA, attaining the highest localization accuracy for minute defects and the fastest inference speed among existing approaches.
📝 Abstract
Motivated by efficiency requirements, most anomaly detection and segmentation (AD&S) methods focus on processing low-resolution images, e.g., $224 imes 224$ pixels, obtained by downsampling the original input images. In this setting, downsampling is typically applied also to the provided ground-truth defect masks. Yet, as numerous industrial applications demand identification of both large and tiny defects, the above-described protocol may fall short in providing a realistic picture of the actual performance attainable by current methods. Hence, in this work, we introduce a novel benchmark that evaluates methods on the original, high-resolution image and ground-truth masks, focusing on segmentation performance as a function of the size of anomalies. Our benchmark includes a metric that captures robustness with respect to defect size, i.e., the ability of a method to preserve good localization from large anomalies to tiny ones. Furthermore, we introduce an AD&S approach based on a novel Teacher-Student paradigm which relies on two shallow MLPs (the Students) that learn to transfer patch features across the layers of a frozen vision transformer (the Teacher). By means of our benchmark, we evaluate our proposal and other recent AD&S methods on high-resolution inputs containing large and tiny defects. Our proposal features the highest robustness to defect size, runs at the fastest speed, yields state-of-the-art performance on the MVTec AD dataset and state-of-the-art segmentation performance on the VisA dataset.