UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

πŸ“… 2024-12-04
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Visual Anomaly Detection (VAD) faces two major challenges: poor cross-domain generalization and strong sample dependency coupled with inconsistent evaluation due to the prevalent β€œone-class-one-model” paradigm. This paper proposes UniVAD, a training-free, unified VAD framework enabling cross-domain anomaly detection with only a few normal samples. Its core comprises three synergistic modules: Contextual Component Clustering (CΒ³) leveraging vision foundation models, Component-Aware Patch Matching (CAPM), and Graph-Enhanced Component Modeling (GECM). To our knowledge, UniVAD is the first VAD method achieving training-free operation, few-shot adaptation, and cross-domain universality simultaneously. Evaluated on nine diverse industrial, logical, and medical cross-domain benchmarks, UniVAD consistently outperforms domain-specific state-of-the-art methods. It establishes a transferable, reusable, and standardized benchmark for VAD, advancing the field toward practical, scalable deployment.

Technology Category

Application Category

πŸ“ Abstract
Visual Anomaly Detection (VAD) aims to identify abnormal samples in images that deviate from normal patterns, covering multiple domains, including industrial, logical, and medical fields. Due to the domain gaps between these fields, existing VAD methods are typically tailored to each domain, with specialized detection techniques and model architectures that are difficult to generalize across different domains. Moreover, even within the same domain, current VAD approaches often follow a"one-category-one-model"paradigm, requiring large amounts of normal samples to train class-specific models, resulting in poor generalizability and hindering unified evaluation across domains. To address this issue, we propose a generalized few-shot VAD method, UniVAD, capable of detecting anomalies across various domains, such as industrial, logical, and medical anomalies, with a training-free unified model. UniVAD only needs few normal samples as references during testing to detect anomalies in previously unseen objects, without training on the specific domain. Specifically, UniVAD employs a Contextual Component Clustering ($C^3$) module based on clustering and vision foundation models to segment components within the image accurately, and leverages Component-Aware Patch Matching (CAPM) and Graph-Enhanced Component Modeling (GECM) modules to detect anomalies at different semantic levels, which are aggregated to produce the final detection result. We conduct experiments on nine datasets spanning industrial, logical, and medical fields, and the results demonstrate that UniVAD achieves state-of-the-art performance in few-shot anomaly detection tasks across multiple domains, outperforming domain-specific anomaly detection models. Code is available at https://github.com/FantasticGNU/UniVAD.
Problem

Research questions and friction points this paper is trying to address.

Detects anomalies across multiple domains without domain-specific training.
Uses few normal samples for anomaly detection in unseen objects.
Improves generalizability and performance in few-shot anomaly detection tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free unified model for anomaly detection
Contextual Component Clustering for image segmentation
Component-Aware Patch Matching for anomaly detection
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhaopeng Gu
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Bingke Zhu
Bingke Zhu
Institute of Automation,Chinese Academy of Science
Guibo Zhu
Guibo Zhu
Institute of Automation, Chinese Academy of Sciecnes
Artificial IntelligenceComputer VisionMachine Learning
Y
Yingying Chen
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; Objecteye Inc., Beijing, China
M
Ming Tang
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
J
Jinqiao Wang
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Objecteye Inc., Beijing, China