CoPS: Conditional Prompt Synthesis for Zero-Shot Anomaly Detection

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Zero-shot anomaly detection (ZSAD) faces two key challenges: static learnable prompts struggle to capture the continuous diversity of normal and anomalous states, while fixed textual labels suffer from semantic sparsity and are prone to overfitting. To address these, we propose CoPS—a conditional prompt synthesis framework that dynamically generates prompts conditioned on visual features, and jointly integrates fine-grained learnable prototypes with variational autoencoder–based implicit class encoding to enable state-adaptive perception and semantic enrichment. Crucially, we introduce a spatially aware alignment mechanism to mitigate prompt rigidity and label sparsity. Evaluated on 13 industrial and medical datasets, CoPS achieves an average 2.5% improvement in AUROC for both classification and segmentation tasks, demonstrating significantly enhanced cross-class generalization and zero-shot anomaly detection performance.

Technology Category

Application Category

📝 Abstract
Recently, large pre-trained vision-language models have shown remarkable performance in zero-shot anomaly detection (ZSAD). With fine-tuning on a single auxiliary dataset, the model enables cross-category anomaly detection on diverse datasets covering industrial defects and medical lesions. Compared to manually designed prompts, prompt learning eliminates the need for expert knowledge and trial-and-error. However, it still faces the following challenges: (i) static learnable tokens struggle to capture the continuous and diverse patterns of normal and anomalous states, limiting generalization to unseen categories; (ii) fixed textual labels provide overly sparse category information, making the model prone to overfitting to a specific semantic subspace. To address these issues, we propose Conditional Prompt Synthesis (CoPS), a novel framework that synthesizes dynamic prompts conditioned on visual features to enhance ZSAD performance. Specifically, we extract representative normal and anomaly prototypes from fine-grained patch features and explicitly inject them into prompts, enabling adaptive state modeling. Given the sparsity of class labels, we leverage a variational autoencoder to model semantic image features and implicitly fuse varied class tokens into prompts. Additionally, integrated with our spatially-aware alignment mechanism, extensive experiments demonstrate that CoPS surpasses state-of-the-art methods by 2.5% AUROC in both classification and segmentation across 13 industrial and medical datasets. Code will be available at https://github.com/cqylunlun/CoPS.
Problem

Research questions and friction points this paper is trying to address.

Static tokens fail to capture diverse anomaly patterns
Fixed labels lack detailed category information
Existing methods overfit to specific semantic subspaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic prompts conditioned on visual features
Normal and anomaly prototypes from patch features
Variational autoencoder for semantic feature modeling
🔎 Similar Papers
No similar papers found.
Qiyu Chen
Qiyu Chen
Institute of Automation, Chinese Academy of Sciences
Anomaly DetectionComputer VisionDeep Learning
Zhen Qu
Zhen Qu
Institude of Automation, Chinese Academy of Sciences
W
Wei Luo
Tsinghua University
Haiming Yao
Haiming Yao
Tsinghua University
Anomaly DetectionMulti-Task LearningAI for ScienceFine-tuning
Yunkang Cao
Yunkang Cao
Hunan University
Visual Anomaly DetectionIndustrial Foundation ModelEmbodied Intelligence
Y
Yuxin Jiang
Huazhong University of Science and Technology
Y
Yinan Duan
Tsinghua University
H
Huiyuan Luo
Institute of Automation, Chinese Academy of Sciences
C
Chengkan Lv
Institute of Automation, Chinese Academy of Sciences
Z
Zhengtao Zhang
Institute of Automation, Chinese Academy of Sciences