MFP-CLIP: Exploring the Efficacy of Multi-Form Prompts for Zero-Shot Industrial Anomaly Detection

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing zero-shot industrial anomaly detection methods suffer from severe missed detections for small-scale and irregularly shaped defects, primarily due to reliance on a single handcrafted prompt, resulting in poor generalization. To address this, we propose a Multi-Form Prompting (MFP) mechanism—comprising Image-to-Text Prompting (I2TP), Self-Prompting (SP), Mask-Guided Prompting (MP), and Multi-Patch Feature Aggregation (MPFA)—the first to jointly integrate these prompting strategies within the CLIP framework for cross-category, sample-free fine-grained anomaly identification and pixel-level localization. By breaking the representational bottleneck of single-prompt paradigms, MFP significantly enhances sensitivity to subtle and structurally complex anomalies. Evaluated on the MVTec-AD and VisA benchmarks, our method achieves substantial improvements over state-of-the-art approaches in both zero-shot classification and localization tasks, demonstrating superior robustness and generalization across diverse industrial defect types.

Technology Category

Application Category

📝 Abstract

Recently, zero-shot anomaly detection (ZSAD) has emerged as a pivotal paradigm for identifying defects in unseen categories without requiring target samples in training phase. However, existing ZSAD methods struggle with the boundary of small and complex defects due to insufficient representations. Most of them use the single manually designed prompts, failing to work for diverse objects and anomalies. In this paper, we propose MFP-CLIP, a novel prompt-based CLIP framework which explores the efficacy of multi-form prompts for zero-shot industrial anomaly detection. We employ an image to text prompting(I2TP) mechanism to better represent the object in the image. MFP-CLIP enhances perception to multi-scale and complex anomalies by self prompting(SP) and a multi-patch feature aggregation(MPFA) module. To precisely localize defects, we introduce the mask prompting(MP) module to guide model to focus on potential anomaly regions. Extensive experiments are conducted on two wildly used industrial anomaly detection benchmarks, MVTecAD and VisA, demonstrating MFP-CLIP's superiority in ZSAD.

Problem

Research questions and friction points this paper is trying to address.

Improves zero-shot anomaly detection for unseen categories

Addresses insufficient representation of small, complex defects

Enhances defect localization with multi-form prompt mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-form prompts enhance anomaly detection.

Image to text prompting improves object representation.

Mask prompting localizes defects more precisely.

🔎 Similar Papers

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection