Learning Multi-view Anomaly Detection

📅 2024-07-16
🏛️ arXiv.org
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
Single-view anomaly detection suffers from viewpoint bias, leading to inaccurate sample-level predictions. To address this, we propose a Multi-View Anomaly Detection (MVAD) framework centered on the Multi-View Adaptive Selection (MVAS) algorithm, enabling cross-view feature learning and fusion. Our key contributions include: (i) the first neighborhood-aware attention window mechanism for semantic correlation modeling, supporting dynamic window sizing and top-K sparsity pruning to achieve linear computational complexity; and (ii) the first unified optimization—under both one-class and multi-class settings—for joint anomaly localization at sample-, image-, and pixel-levels. The method comprises multi-view feature encoding, neighborhood window partitioning, cross-view semantic correlation matrix construction, and a lightweight fusion network. On Real-IAD, MVAD achieves state-of-the-art performance across all ten metrics: +4.1% (sample-level), +5.6% (image-level), and +6.7% (pixel-level) AUROC, with only 18M parameters—significantly reducing GPU memory footprint and training cost.

Technology Category

Application Category

📝 Abstract
This study explores the recently proposed challenging multi-view Anomaly Detection (AD) task. Single-view tasks would encounter blind spots from other perspectives, resulting in inaccuracies in sample-level prediction. Therefore, we introduce the extbf{M}ulti- extbf{V}iew extbf{A}nomaly extbf{D}etection ( extbf{MVAD}) framework, which learns and integrates features from multi-views. Specifically, we proposed a extbf{M}ulti- extbf{V}iew extbf{A}daptive extbf{S}election ( extbf{MVAS}) algorithm for feature learning and fusion across multiple views. The feature maps are divided into neighbourhood attention windows to calculate a semantic correlation matrix between single-view windows and all other views, which is a conducted attention mechanism for each single-view window and the top-K most correlated multi-view windows. Adjusting the window sizes and top-K can minimise the computational complexity to linear. Extensive experiments on the Real-IAD dataset for cross-setting (multi/single-class) validate the effectiveness of our approach, achieving state-of-the-art performance among sample extbf{4.1%}$uparrow$/ image extbf{5.6%}$uparrow$/pixel extbf{6.7%}$uparrow$ levels with a total of ten metrics with only extbf{18M} parameters and fewer GPU memory and training time.
Problem

Research questions and friction points this paper is trying to address.

Multi-view anomaly detection addresses single-view blind spots
Adaptive selection algorithm integrates features across multiple views
Efficient attention mechanism minimizes computational complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view adaptive selection algorithm for integration
Neighbourhood attention windows calculate semantic correlation matrix
Adjustable window sizes minimize computational complexity
🔎 Similar Papers
No similar papers found.
H
Haoyang He
State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China
J
Jiangning Zhang
YouTu Lab, Tencent, Shanghai 200233, China
Guanzhong Tian
Guanzhong Tian
Ningbo Research Institute, Zhejiang University
Computer VisionModel CompressionPattern Recognition
C
Chengjie Wang
YouTu Lab, Tencent, Shanghai 200233, China
L
Lei Xie
State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China