MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing zero-shot industrial anomaly detection methods suffer from severe missed detections for small-scale and irregularly shaped defects, primarily due to reliance on a single handcrafted prompt, resulting in poor generalization. To address this, we propose a Multi-Form Prompting (MFP) mechanism—comprising Image-to-Text Prompting (I2TP), Self-Prompting (SP), Mask-Guided Prompting (MP), and Multi-Patch Feature Aggregation (MPFA)—the first to jointly integrate these prompting strategies within the CLIP framework for cross-category, sample-free fine-grained anomaly identification and pixel-level localization. By breaking the representational bottleneck of single-prompt paradigms, MFP significantly enhances sensitivity to subtle and structurally complex anomalies. Evaluated on the MVTec-AD and VisA benchmarks, our method achieves substantial improvements over state-of-the-art approaches in both zero-shot classification and localization tasks, demonstrating superior robustness and generalization across diverse industrial defect types.

Technology Category

Application Category

📝 Abstract
Recently, zero-shot anomaly detection (ZSAD) has emerged as a pivotal paradigm for identifying defects in unseen categories without requiring target samples in training phase. However, existing ZSAD methods struggle with the boundary of small and complex defects due to insufficient representations. Most of them use the single manually designed prompts, failing to work for diverse objects and anomalies. In this paper, we propose MFP-CLIP, a novel prompt-based CLIP framework which explores the efficacy of multi-form prompts for zero-shot industrial anomaly detection. We employ an image to text prompting(I2TP) mechanism to better represent the object in the image. MFP-CLIP enhances perception to multi-scale and complex anomalies by self prompting(SP) and a multi-patch feature aggregation(MPFA) module. To precisely localize defects, we introduce the mask prompting(MP) module to guide model to focus on potential anomaly regions. Extensive experiments are conducted on two wildly used industrial anomaly detection benchmarks, MVTecAD and VisA, demonstrating MFP-CLIP's superiority in ZSAD.
Problem

Research questions and friction points this paper is trying to address.

Improves zero-shot anomaly detection for unseen categories
Addresses insufficient representation of small, complex defects
Enhances defect localization with multi-form prompt mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-form prompts enhance anomaly detection.
Image to text prompting improves object representation.
Mask prompting localizes defects more precisely.
🔎 Similar Papers
No similar papers found.
Jingyi Yuan
Jingyi Yuan
the School of Intelligent Systems Engineering, Sun Yat-Sen University
Pengyu Jie
Pengyu Jie
School of intelligent engineering, Sun Yat-sen University
machine learningmedical imagingcomputer vision
J
Junyin Zhang
the School of Intelligent Systems Engineering, Sun Yat-Sen University
Z
Ziao Li
the School of Intelligent Systems Engineering, Sun Yat-Sen University
C
Chenqiang Gao
the School of Intelligent Systems Engineering, Sun Yat-Sen University