One Dinomaly2 Detect Them All: A Unified Framework for Full-Spectrum Unsupervised Anomaly Detection

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing unsupervised anomaly detection (UAD) methods face two key challenges: (1) multi-class models substantially underperform single-class state-of-the-art (SOTA) approaches, and (2) domain fragmentation—e.g., specialized solutions for 3D, few-shot, or multimodal settings—hinders unified deployment. To address these, we propose Dinomaly2, the first unified UAD framework for the full spectrum of image modalities (2D, 3D, multi-view, infrared). Centered on extreme architectural simplicity, it integrates only five lightweight modules—feature extraction, memory bank, residual modeling, multi-scale fusion, and contrastive learning—to enable zero-shot, cross-task adaptation without task-specific tuning. Built upon a reconstruction paradigm with standardized network architecture, Dinomaly2 achieves new SOTA performance across 12 benchmarks: 99.9% and 99.3% image-level AUROC on MVTec-AD and VisA (multi-class), respectively; remarkably, it retains 98.7% and 97.4% AUROC using merely eight normal samples—surpassing prior full-sample methods. These results underscore the critical role of minimalism in achieving broad generalizability.

Technology Category

Application Category

📝 Abstract
Unsupervised anomaly detection (UAD) has evolved from building specialized single-class models to unified multi-class models, yet existing multi-class models significantly underperform the most advanced one-for-one counterparts. Moreover, the field has fragmented into specialized methods tailored to specific scenarios (multi-class, 3D, few-shot, etc.), creating deployment barriers and highlighting the need for a unified solution. In this paper, we present Dinomaly2, the first unified framework for full-spectrum image UAD, which bridges the performance gap in multi-class models while seamlessly extending across diverse data modalities and task settings. Guided by the "less is more" philosophy, we demonstrate that the orchestration of five simple element achieves superior performance in a standard reconstruction-based framework. This methodological minimalism enables natural extension across diverse tasks without modification, establishing that simplicity is the foundation of true universality. Extensive experiments on 12 UAD benchmarks demonstrate Dinomaly2's full-spectrum superiority across multiple modalities (2D, multi-view, RGB-3D, RGB-IR), task settings (single-class, multi-class, inference-unified multi-class, few-shot) and application domains (industrial, biological, outdoor). For example, our multi-class model achieves unprecedented 99.9% and 99.3% image-level (I-) AUROC on MVTec-AD and VisA respectively. For multi-view and multi-modal inspection, Dinomaly2 demonstrates state-of-the-art performance with minimum adaptations. Moreover, using only 8 normal examples per class, our method surpasses previous full-shot models, achieving 98.7% and 97.4% I-AUROC on MVTec-AD and VisA. The combination of minimalistic design, computational scalability, and universal applicability positions Dinomaly2 as a unified solution for the full spectrum of real-world anomaly detection applications.
Problem

Research questions and friction points this paper is trying to address.

Bridging performance gap in multi-class unsupervised anomaly detection
Unifying fragmented specialized methods across diverse data modalities
Achieving full-spectrum anomaly detection with minimalistic design principles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for full-spectrum unsupervised anomaly detection
Orchestration of five simple elements in reconstruction framework
Minimalistic design enabling universal extension across diverse tasks
🔎 Similar Papers
No similar papers found.
J
Jia Guo
Tsinghua University, Beijing, China
S
Shuai Lu
Beijing Institute of Technology, Beijing, China
L
Lei Fan
University of New South Wales, Sydney, Australia
Z
Zelin Li
City University of Hong Kong, Hong Kong SAR
Donglin Di
Donglin Di
Li Auto Inc.
Generative ModelsEmbodied AIMedical ImageMultimedia
Y
Yang Song
University of New South Wales, Sydney, Australia
Weihang Zhang
Weihang Zhang
Assistant Professor, School of Medical Technology, Beijing Institute of Technology
medical image processing
Wenbing Zhu
Wenbing Zhu
Fudan university, Rongcheer
Machine LearningComputer Vision
H
Hong Yan
City University of Hong Kong, Hong Kong SAR
F
Fang Chen
Shanghai Jiao Tong University, Shanghai, China
H
Huiqi Li
Beijing Institute of Technology, Beijing, China
H
Hongen Liao
Tsinghua University, Beijing, China; Shanghai Jiao Tong University, Shanghai, China