PatchEAD: Unifying Industrial Visual Prompting Frameworks for Patch-Exclusive Anomaly Detection

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Industrial anomaly detection increasingly relies on foundation models, yet existing methods predominantly focus on text-prompt tuning, while visual prompting lacks a unified framework and remains tightly coupled to specific model architectures. This work proposes PatchEAD—the first generic, training-free, purely patch-based visual prompting framework tailored for industrial anomaly detection. Its core components are: (1) a foreground-mask-guided adaptive patch sampling mechanism, and (2) a cross-model feature alignment module that mitigates patch similarity mismatches arising from divergent pretraining biases across foundation models. PatchEAD eliminates reliance on textual prompts entirely and enables plug-and-play deployment. Extensive experiments demonstrate that PatchEAD significantly outperforms state-of-the-art methods under both few-shot and batch-wise zero-shot settings, validating the effectiveness, robustness, and cross-model generalizability of pure visual patch prompting.

Technology Category

Application Category

📝 Abstract

Industrial anomaly detection is increasingly relying on foundation models, aiming for strong out-of-distribution generalization and rapid adaptation in real-world deployments. Notably, past studies have primarily focused on textual prompt tuning, leaving the intrinsic visual counterpart fragmented into processing steps specific to each foundation model. We aim to address this limitation by proposing a unified patch-focused framework, Patch-Exclusive Anomaly Detection (PatchEAD), enabling training-free anomaly detection that is compatible with diverse foundation models. The framework constructs visual prompting techniques, including an alignment module and foreground masking. Our experiments show superior few-shot and batch zero-shot performance compared to prior work, despite the absence of textual features. Our study further examines how backbone structure and pretrained characteristics affect patch-similarity robustness, providing actionable guidance for selecting and configuring foundation models for real-world visual inspection. These results confirm that a well-unified patch-only framework can enable quick, calibration-light deployment without the need for carefully engineered textual prompts.

Problem

Research questions and friction points this paper is trying to address.

Unifying fragmented visual prompting frameworks for anomaly detection

Enabling training-free compatibility with diverse foundation models

Providing backbone selection guidance for robust patch-similarity analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified patch-focused framework for anomaly detection

Training-free method compatible with diverse foundation models

Visual prompting techniques including alignment and masking

🔎 Similar Papers

AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2