Zero-Shot Image Anomaly Detection Using Generative Foundation Models

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses open-world zero-shot image anomaly detection without requiring target-domain data for retraining. Methodologically, it leverages denoising trajectories from a pre-trained denoising diffusion model (DDM) to extract multi-scale texture and semantic features, and quantifies reconstruction discrepancies via SSIM-weighted Stein score errors for out-of-distribution (OOD) sample discrimination. Crucially, it repurposes a single DDM—originally trained on CelebA—as a cross-domain perceptual template, enabling unified detection of both semantic and structural anomalies under zero-shot conditions for the first time. Evaluated on multiple benchmarks, the approach achieves state-of-the-art performance, surpassing some ImageNet-supervised methods in certain scenarios. These results substantiate the feasibility and strong generalization capability of generative foundation models as universal anomaly detectors.

Technology Category

Application Category

📝 Abstract
Detecting out-of-distribution (OOD) inputs is pivotal for deploying safe vision systems in open-world environments. We revisit diffusion models, not as generators, but as universal perceptual templates for OOD detection. This research explores the use of score-based generative models as foundational tools for semantic anomaly detection across unseen datasets. Specifically, we leverage the denoising trajectories of Denoising Diffusion Models (DDMs) as a rich source of texture and semantic information. By analyzing Stein score errors, amplified through the Structural Similarity Index Metric (SSIM), we introduce a novel method for identifying anomalous samples without requiring re-training on each target dataset. Our approach improves over state-of-the-art and relies on training a single model on one dataset -- CelebA -- which we find to be an effective base distribution, even outperforming more commonly used datasets like ImageNet in several settings. Experimental results show near-perfect performance on some benchmarks, with notable headroom on others, highlighting both the strength and future potential of generative foundation models in anomaly detection.
Problem

Research questions and friction points this paper is trying to address.

Detects out-of-distribution inputs for safe vision systems
Uses diffusion models as perceptual templates for anomaly detection
Identifies anomalies without retraining on target datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using diffusion models as perceptual templates
Leveraging denoising trajectories for anomaly detection
Analyzing Stein score errors with SSIM
🔎 Similar Papers
2023-10-29International Conference on Learning RepresentationsCitations: 114
L
Lemar Abdi
Eindhoven University of Technology, The Netherlands
A
Amaan Valiuddin
Eindhoven University of Technology, The Netherlands
F
Francisco Caetano
Eindhoven University of Technology, The Netherlands
C
Christiaan Viviers
Eindhoven University of Technology, The Netherlands
Fons van der Sommen
Fons van der Sommen
Associate Professor, Eindhoven University of Technology
Image processingComputer VisionMedical Image AnalysisComputer-Aided DiagnosisMachine learning