🤖 AI Summary
In multi-class unsupervised anomaly detection (UAD), unified models consistently underperform specialized single-class approaches. Method: This paper proposes a lightweight, pure-Transformer reconstruction framework that abandons hand-crafted components and complex modules. It employs only a basic Transformer architecture (self-attention + MLP), Foundation Transformer features, a Dropout-based noise bottleneck, linear attention to mitigate overfitting, and an image-level relaxed reconstruction objective—embodying the “less-is-more” paradigm. Contribution/Results: To our knowledge, this is the first work demonstrating that a minimalist design can surpass state-of-the-art single-class UAD methods. It achieves image-level AUROC scores of 99.6%, 98.7%, and 89.3% on MVTec-AD, VisA, and Real-IAD, respectively—outperforming all existing multi-class UAD approaches and establishing new single-class UAD records.
📝 Abstract
Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our research aims to bridge this substantial performance gap. In this paper, we introduce a minimalistic reconstruction-based anomaly detection framework, namely Dinomaly, which leverages pure Transformer architectures without relying on complex designs, additional modules, or specialized tricks. Given this powerful framework consisted of only Attentions and MLPs, we found four simple components that are essential to multi-class anomaly detection: (1) Foundation Transformers that extracts universal and discriminative features, (2) Noisy Bottleneck where pre-existing Dropouts do all the noise injection tricks, (3) Linear Attention that naturally cannot focus, and (4) Loose Reconstruction that does not force layer-to-layer and point-by-point reconstruction. Extensive experiments are conducted across popular anomaly detection benchmarks including MVTec-AD, VisA, and Real-IAD. Our proposed Dinomaly achieves impressive image-level AUROC of 99.6%, 98.7%, and 89.3% on the three datasets respectively, which is not only superior to state-of-the-art multi-class UAD methods, but also achieves the most advanced class-separated UAD records.