🤖 AI Summary
This work addresses the high computational cost and limited cross-domain generalization in zero-shot anomaly detection caused by prompt learning or complex modeling. The authors propose MRAD, a novel training-free, memory-driven retrieval framework that constructs dual-level (image- and pixel-wise) memory banks and directly leverages similarity search with a frozen CLIP encoder to generate anomaly scores, thereby avoiding parameterized fitting. To further enhance discriminability and generalization, two lightweight variants are introduced: MRAD-FT, which applies linear fine-tuning, and MRAD-CLIP, which injects dynamic textual prompt bias. Extensive experiments across 16 industrial and medical datasets demonstrate that MRAD achieves state-of-the-art performance in both anomaly classification and segmentation, validating the efficacy of leveraging empirical data distributions for efficient zero-shot anomaly detection.
📝 Abstract
Zero-shot anomaly detection (ZSAD) often leverages pretrained vision or vision-language models, but many existing methods use prompt learning or complex modeling to fit the data distribution, resulting in high training or inference cost and limited cross-domain stability. To address these limitations, we propose Memory-Retrieval Anomaly Detection method (MRAD), a unified framework that replaces parametric fitting with a direct memory retrieval. The train-free base model, MRAD-TF, freezes the CLIP image encoder and constructs a two-level memory bank (image-level and pixel-level) from auxiliary data, where feature-label pairs are explicitly stored as keys and values. During inference, anomaly scores are obtained directly by similarity retrieval over the memory bank. Based on the MRAD-TF, we further propose two lightweight variants as enhancements: (i) MRAD-FT fine-tunes the retrieval metric with two linear layers to enhance the discriminability between normal and anomaly; (ii) MRAD-CLIP injects the normal and anomalous region priors from the MRAD-FT as dynamic biases into CLIP's learnable text prompts, strengthening generalization to unseen categories. Across 16 industrial and medical datasets, the MRAD framework consistently demonstrates superior performance in anomaly classification and segmentation, under both train-free and training-based settings. Our work shows that fully leveraging the empirical distribution of raw data, rather than relying only on model fitting, can achieve stronger anomaly detection performance. The code will be publicly released at https://github.com/CROVO1026/MRAD.