🤖 AI Summary
We address zero-shot image anomaly localization—localizing anomalous regions in a test image without access to any normal training samples. To this end, we propose the Single Shot Decomposition Network (SSDnet), the first method to incorporate deep image prior into zero-shot anomaly detection. SSDnet leverages self-supervised reconstruction from a single test image to model the intrinsic structural prior of normal appearance. To prevent trivial identity mapping, it employs block masking, spatial shuffling, and Gaussian noise perturbations. Furthermore, we introduce a perceptual loss based on inner-product similarity to enhance structural awareness. Evaluated on MVTec-AD and the textile dataset, SSDnet achieves 0.99/0.98 AUROC and 0.60/0.67 AUPRC, respectively—substantially surpassing state-of-the-art methods. Crucially, SSDnet requires no external data or normal exemplars, enabling truly data-free, single-image-driven anomaly localization with high precision.
📝 Abstract
Anomaly detection in images is typically addressed by learning from collections of training data or relying on reference samples. In many real-world scenarios, however, such training data may be unavailable, and only the test image itself is provided. We address this zero-shot setting by proposing a single-image anomaly localization method that leverages the inductive bias of convolutional neural networks, inspired by Deep Image Prior (DIP). Our method is named Single Shot Decomposition Network (SSDnet). Our key assumption is that natural images often exhibit unified textures and patterns, and that anomalies manifest as localized deviations from these repetitive or stochastic patterns. To learn the deep image prior, we design a patch-based training framework where the input image is fed directly into the network for self-reconstruction, rather than mapping random noise to the image as done in DIP. To avoid the model simply learning an identity mapping, we apply masking, patch shuffling, and small Gaussian noise. In addition, we use a perceptual loss based on inner-product similarity to capture structure beyond pixel fidelity. Our approach needs no external training data, labels, or references, and remains robust in the presence of noise or missing pixels. SSDnet achieves 0.99 AUROC and 0.60 AUPRC on MVTec-AD and 0.98 AUROC and 0.67 AUPRC on the fabric dataset, outperforming state-of-the-art methods. The implementation code will be released at https://github.com/mehrdadmoradi124/SSDnet