Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

📅 2024-05-23

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

177K/year

🤖 AI Summary

In multi-class unsupervised anomaly detection (UAD), unified models consistently underperform specialized single-class approaches. Method: This paper proposes a lightweight, pure-Transformer reconstruction framework that abandons hand-crafted components and complex modules. It employs only a basic Transformer architecture (self-attention + MLP), Foundation Transformer features, a Dropout-based noise bottleneck, linear attention to mitigate overfitting, and an image-level relaxed reconstruction objective—embodying the “less-is-more” paradigm. Contribution/Results: To our knowledge, this is the first work demonstrating that a minimalist design can surpass state-of-the-art single-class UAD methods. It achieves image-level AUROC scores of 99.6%, 98.7%, and 89.3% on MVTec-AD, VisA, and Real-IAD, respectively—outperforming all existing multi-class UAD approaches and establishing new single-class UAD records.

Technology Category

Application Category

📝 Abstract

Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our research aims to bridge this substantial performance gap. In this paper, we introduce a minimalistic reconstruction-based anomaly detection framework, namely Dinomaly, which leverages pure Transformer architectures without relying on complex designs, additional modules, or specialized tricks. Given this powerful framework consisted of only Attentions and MLPs, we found four simple components that are essential to multi-class anomaly detection: (1) Foundation Transformers that extracts universal and discriminative features, (2) Noisy Bottleneck where pre-existing Dropouts do all the noise injection tricks, (3) Linear Attention that naturally cannot focus, and (4) Loose Reconstruction that does not force layer-to-layer and point-by-point reconstruction. Extensive experiments are conducted across popular anomaly detection benchmarks including MVTec-AD, VisA, and Real-IAD. Our proposed Dinomaly achieves impressive image-level AUROC of 99.6%, 98.7%, and 89.3% on the three datasets respectively, which is not only superior to state-of-the-art multi-class UAD methods, but also achieves the most advanced class-separated UAD records.

Problem

Research questions and friction points this paper is trying to address.

Bridges performance gap in multi-class unsupervised anomaly detection

Introduces minimalistic Transformer framework without complex designs

Identifies four key components for effective anomaly detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimalistic Transformer-based reconstruction framework

Four simple essential components for detection

No complex designs or additional modules

🔎 Similar Papers

Anomaly Detection by Context Contrasting