Zero-Shot Image Anomaly Detection Using Generative Foundation Models

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses open-world zero-shot image anomaly detection without requiring target-domain data for retraining. Methodologically, it leverages denoising trajectories from a pre-trained denoising diffusion model (DDM) to extract multi-scale texture and semantic features, and quantifies reconstruction discrepancies via SSIM-weighted Stein score errors for out-of-distribution (OOD) sample discrimination. Crucially, it repurposes a single DDM—originally trained on CelebA—as a cross-domain perceptual template, enabling unified detection of both semantic and structural anomalies under zero-shot conditions for the first time. Evaluated on multiple benchmarks, the approach achieves state-of-the-art performance, surpassing some ImageNet-supervised methods in certain scenarios. These results substantiate the feasibility and strong generalization capability of generative foundation models as universal anomaly detectors.

Technology Category

Application Category

📝 Abstract

Detecting out-of-distribution (OOD) inputs is pivotal for deploying safe vision systems in open-world environments. We revisit diffusion models, not as generators, but as universal perceptual templates for OOD detection. This research explores the use of score-based generative models as foundational tools for semantic anomaly detection across unseen datasets. Specifically, we leverage the denoising trajectories of Denoising Diffusion Models (DDMs) as a rich source of texture and semantic information. By analyzing Stein score errors, amplified through the Structural Similarity Index Metric (SSIM), we introduce a novel method for identifying anomalous samples without requiring re-training on each target dataset. Our approach improves over state-of-the-art and relies on training a single model on one dataset -- CelebA -- which we find to be an effective base distribution, even outperforming more commonly used datasets like ImageNet in several settings. Experimental results show near-perfect performance on some benchmarks, with notable headroom on others, highlighting both the strength and future potential of generative foundation models in anomaly detection.

Problem

Research questions and friction points this paper is trying to address.

Detects out-of-distribution inputs for safe vision systems

Uses diffusion models as perceptual templates for anomaly detection

Identifies anomalies without retraining on target datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using diffusion models as perceptual templates

Leveraging denoising trajectories for anomaly detection

Analyzing Stein score errors with SSIM

🔎 Similar Papers

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection