Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models (e.g., CLIP) struggle with fine-grained anomaly classification (e.g., “holes”, “scratches”) in zero-shot industrial defect detection, leading to imprecise localization, poor attribution, and bias from handcrafted class prompts. Method: We propose a defect-aware hybrid prompt progressive optimization framework that jointly leverages fixed textual anchors and learnable token embeddings—enabling fine-grained anomaly semantic alignment and disentanglement without defect annotations. Integrating cross-modal feature alignment with defect-category-driven prompt modeling, we extend CLIP for zero-shot multi-class defect detection and pixel-level segmentation. Results: Evaluated on five public benchmarks (including MPDD) and an internal dataset, our method achieves +3.7% improvement in image-level AUROC/AP and +6.5% gain in novel defect localization accuracy, significantly enhancing zero-shot fine-grained anomaly understanding and precise spatial localization.

Technology Category

Application Category

📝 Abstract
Recent vision language models (VLMs) like CLIP have demonstrated impressive anomaly detection performance under significant distribution shift by utilizing high-level semantic information through text prompts. However, these models often neglect fine-grained details, such as which kind of anomalies, like "hole", "cut", "scratch" that could provide more specific insight into the nature of anomalies. We argue that recognizing fine-grained anomaly types 1) enriches the representation of "abnormal" with structured semantics, narrowing the gap between coarse anomaly signals and fine-grained defect categories; 2) enables manufacturers to understand the root causes of the anomaly and implement more targeted and appropriate corrective measures quickly. While incorporating such detailed semantic information is crucial, designing handcrafted prompts for each defect type is both time-consuming and susceptible to human bias. For this reason, we introduce DAPO, a novel approach for Defect-aware Prompt Optimization based on progressive tuning for the zero-shot multi-type and binary anomaly detection and segmentation under distribution shifts. Our approach aligns anomaly-relevant image features with their corresponding text semantics by learning hybrid defect-aware prompts with both fixed textual anchors and learnable token embeddings. We conducted experiments on public benchmarks (MPDD, VisA, MVTec-AD, MAD, and Real-IAD) and an internal dataset. The results suggest that compared to the baseline models, DAPO achieves a 3.7% average improvement in AUROC and average precision metrics at the image level under distribution shift, and a 6.5% average improvement in localizing novel anomaly types under zero-shot settings.
Problem

Research questions and friction points this paper is trying to address.

Optimizes prompts to detect specific anomaly types like holes or scratches
Enables zero-shot anomaly detection without manual prompt engineering
Improves localization of novel defects under distribution shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid defect-aware prompts with fixed anchors and learnable tokens
Progressive tuning for zero-shot multi-type anomaly detection
Aligns anomaly image features with corresponding text semantics
🔎 Similar Papers
No similar papers found.