π€ AI Summary
Existing unsupervised anomaly detection (UAD) methods suffer from inaccurate image- or feature-level matching due to noise sensitivity, edge blurring, and failure to capture subtle anomalies, thereby limiting detection performance. To address this, we propose the first cost volume modeling framework for UAD, integrated with a multi-layer attention-guided 3D convolutional filtering networkβa plug-and-play post-processing module. Our approach dynamically refines matching costs via cross-layer attention, jointly preserving structural fidelity and enhancing anomaly sensitivity. The lightweight, modular architecture seamlessly supports both reconstruction-based and embedding-based UAD paradigms. Extensive experiments on MVTec-AD and VisA benchmarks demonstrate significant improvements in both single-class and multi-class anomaly detection, achieving state-of-the-art performance. Ablation studies confirm the generalizability and robustness of our method across diverse anomaly types and domains. Code and pretrained models are publicly available.
π Abstract
Unsupervised anomaly detection (UAD) seeks to localize the anomaly mask of an input image with respect to normal samples. Either by reconstructing normal counterparts (reconstruction-based) or by learning an image feature embedding space (embedding-based), existing approaches fundamentally rely on image-level or feature-level matching to derive anomaly scores. Often, such a matching process is inaccurate yet overlooked, leading to sub-optimal detection. To address this issue, we introduce the concept of cost filtering, borrowed from classical matching tasks, such as depth and flow estimation, into the UAD problem. We call this approach {em CostFilter-AD}. Specifically, we first construct a matching cost volume between the input and normal samples, comprising two spatial dimensions and one matching dimension that encodes potential matches. To refine this, we propose a cost volume filtering network, guided by the input observation as an attention query across multiple feature layers, which effectively suppresses matching noise while preserving edge structures and capturing subtle anomalies. Designed as a generic post-processing plug-in, CostFilter-AD can be integrated with either reconstruction-based or embedding-based methods. Extensive experiments on MVTec-AD and VisA benchmarks validate the generic benefits of CostFilter-AD for both single- and multi-class UAD tasks. Code and models will be released at https://github.com/ZHE-SAPI/CostFilter-AD.