🤖 AI Summary
This work addresses the challenge in unsupervised anomaly detection where matching test images to normal templates is prone to noise induced by intra-class variation, misalignment, and limited template representativeness. To mitigate these issues, the authors propose RAID, a novel framework that introduces retrieval-augmented mechanisms into this task for the first time. RAID employs a hierarchical vector database to retrieve normal samples at category, semantic, and instance levels, establishing a coarse-to-fine multi-stage retrieval system. It further integrates a matching cost volume with a guided Mixture-of-Experts network to dynamically suppress matching noise during anomaly map generation. Extensive experiments demonstrate that RAID achieves state-of-the-art performance across multiple benchmarks—including MVTec, VisA, MPDD, and BTAD—under full-data, few-shot, and cross-dataset settings, significantly enhancing the robustness of both anomaly detection and localization.
📝 Abstract
Unsupervised Anomaly Detection (UAD) aims to identify abnormal regions by establishing correspondences between test images and normal templates. Existing methods primarily rely on image reconstruction or template retrieval but face a fundamental challenge: matching between test images and normal templates inevitably introduces noise due to intra-class variations, imperfect correspondences, and limited templates. Observing that Retrieval-Augmented Generation (RAG) leverages retrieved samples directly in the generation process, we reinterpret UAD through this lens and introduce \textbf{RAID}, a retrieval-augmented UAD framework designed for noise-resilient anomaly detection and localization. Unlike standard RAG that enriches context or knowledge, we focus on using retrieved normal samples to guide noise suppression in anomaly map generation. RAID retrieves class-, semantic-, and instance-level representations from a hierarchical vector database, forming a coarse-to-fine pipeline. A matching cost volume correlates the input with retrieved exemplars, followed by a guided Mixture-of-Experts (MoE) network that leverages the retrieved samples to adaptively suppress matching noise and produce fine-grained anomaly maps. RAID achieves state-of-the-art performance across full-shot, few-shot, and multi-dataset settings on MVTec, VisA, MPDD, and BTAD benchmarks. \href{https://github.com/Mingxiu-Cai/RAID}{https://github.com/Mingxiu-Cai/RAID}.