🤖 AI Summary
This work addresses open-world zero-shot image anomaly detection without requiring target-domain data for retraining. Methodologically, it leverages denoising trajectories from a pre-trained denoising diffusion model (DDM) to extract multi-scale texture and semantic features, and quantifies reconstruction discrepancies via SSIM-weighted Stein score errors for out-of-distribution (OOD) sample discrimination. Crucially, it repurposes a single DDM—originally trained on CelebA—as a cross-domain perceptual template, enabling unified detection of both semantic and structural anomalies under zero-shot conditions for the first time. Evaluated on multiple benchmarks, the approach achieves state-of-the-art performance, surpassing some ImageNet-supervised methods in certain scenarios. These results substantiate the feasibility and strong generalization capability of generative foundation models as universal anomaly detectors.
📝 Abstract
Detecting out-of-distribution (OOD) inputs is pivotal for deploying safe vision systems in open-world environments. We revisit diffusion models, not as generators, but as universal perceptual templates for OOD detection. This research explores the use of score-based generative models as foundational tools for semantic anomaly detection across unseen datasets. Specifically, we leverage the denoising trajectories of Denoising Diffusion Models (DDMs) as a rich source of texture and semantic information. By analyzing Stein score errors, amplified through the Structural Similarity Index Metric (SSIM), we introduce a novel method for identifying anomalous samples without requiring re-training on each target dataset. Our approach improves over state-of-the-art and relies on training a single model on one dataset -- CelebA -- which we find to be an effective base distribution, even outperforming more commonly used datasets like ImageNet in several settings. Experimental results show near-perfect performance on some benchmarks, with notable headroom on others, highlighting both the strength and future potential of generative foundation models in anomaly detection.