Unified Unsupervised Anomaly Detection via Matching Cost Filtering

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised anomaly detection (UAD) faces three key challenges: scarcity of anomalous samples, matching noise in feature correspondence, and fragmented treatment of single- versus multi-modal data. This paper proposes the first unified, matching-centric framework for multi-modal UAD—supporting RGB, RGB-3D, and RGB-Text modalities—grounded in a learnable matching cost filtering mechanism. Specifically, we construct an anomaly cost volume and introduce a multi-layer attention-guided filtering module that adaptively denoises inter-sample matches (both intra- and cross-modal) while amplifying subtle anomalies. Our method is backbone-agnostic and plug-and-play, requiring no architectural modifications to the underlying feature extractor. It effectively suppresses matching noise and enhances sensitivity to fine-grained anomalies. Extensive evaluation across 22 benchmarks consistently establishes new state-of-the-art performance for both single-modal and multi-modal UAD.

Technology Category

Application Category

📝 Abstract
Unsupervised anomaly detection (UAD) aims to identify image- and pixel-level anomalies using only normal training data, with wide applications such as industrial inspection and medical analysis, where anomalies are scarce due to privacy concerns and cold-start constraints. Existing methods, whether reconstruction-based (restoring normal counterparts) or embedding-based (pretrained representations), fundamentally conduct image- or feature-level matching to generate anomaly maps. Nonetheless, matching noise has been largely overlooked, limiting their detection ability. Beyond earlier focus on unimodal RGB-based UAD, recent advances expand to multimodal scenarios, e.g., RGB--3D and RGB--Text, enabled by point cloud sensing and vision--language models. Despite shared challenges, these lines remain largely isolated, hindering a comprehensive understanding and knowledge transfer. In this paper, we advocate unified UAD for both unimodal and multimodal settings in the matching perspective. Under this insight, we present Unified Cost Filtering (UCF), a generic post-hoc refinement framework for refining anomaly cost volume of any UAD model. The cost volume is constructed by matching a test sample against normal samples from the same or different modalities, followed by a learnable filtering module with multi-layer attention guidance from the test sample, mitigating matching noise and highlighting subtle anomalies. Comprehensive experiments on 22 diverse benchmarks demonstrate the efficacy of UCF in enhancing a variety of UAD methods, consistently achieving new state-of-the-art results in both unimodal (RGB) and multimodal (RGB--3D, RGB--Text) UAD scenarios. Code and models will be released at https://github.com/ZHE-SAPI/CostFilter-AD.
Problem

Research questions and friction points this paper is trying to address.

Addressing matching noise in unsupervised anomaly detection methods
Unifying anomaly detection across unimodal and multimodal scenarios
Refining anomaly cost volumes through learnable filtering modules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Cost Filtering framework for anomaly detection
Post-hoc refinement of anomaly cost volume
Multi-layer attention guidance reduces matching noise
🔎 Similar Papers
No similar papers found.
Z
Zhe Zhang
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China
M
Mingxiu Cai
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China
G
Gaochang Wu
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China
J
Jing Zhang
School of Computer Science, Wuhan University, China
Lingqiao Liu
Lingqiao Liu
Associate Professor at the University of Adelaide
computer visionmachine learning
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
Tianyou Chai
Tianyou Chai
Northeastern University China
modelingcontroloptimizationintegrated automation of industrial processesadaptive control
Xiatian Zhu
Xiatian Zhu
University of Surrey
Machine LearningComputer Vision