IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Current few-shot industrial anomaly detection (FS-IAD) methods rely on large vision-language models (LVLMs) but lack domain-specific industrial priors and structured reasoning capabilities, limiting their performance relative to human inspectors. To address this, we propose IADGPT—the first unified vision-language model tailored for FS-IAD, enabling joint anomaly detection, pixel-level localization, and causal reasoning. Our approach introduces a novel three-stage progressive training strategy and a context-aware learning paradigm, enabling effective injection of industrial knowledge and zero-shot adaptation to unseen anomaly categories with minimal samples. IADGPT jointly outputs image-level classification and pixel-level segmentation masks, while enhancing interpretability via attention visualization and logits-based decoding. Evaluated on a high-quality industrial dataset comprising 100K images across 400 anomaly classes, IADGPT significantly outperforms state-of-the-art methods in detection accuracy, localization precision, and reasoning validity, demonstrating strong generalization and practical deployability in real-world industrial settings.

Technology Category

Application Category

📝 Abstract

Few-Shot Industrial Anomaly Detection (FS-IAD) has important applications in automating industrial quality inspection. Recently, some FS-IAD methods based on Large Vision-Language Models (LVLMs) have been proposed with some achievements through prompt learning or fine-tuning. However, existing LVLMs focus on general tasks but lack basic industrial knowledge and reasoning capabilities related to FS-IAD, making these methods far from specialized human quality inspectors. To address these challenges, we propose a unified framework, IADGPT, designed to perform FS-IAD in a human-like manner, while also handling associated localization and reasoning tasks, even for diverse and novel industrial products. To this end, we introduce a three-stage progressive training strategy inspired by humans. Specifically, the first two stages gradually guide IADGPT in acquiring fundamental industrial knowledge and discrepancy awareness. In the third stage, we design an in-context learning-based training paradigm, enabling IADGPT to leverage a few-shot image as the exemplars for improved generalization to novel products. In addition, we design a strategy that enables IADGPT to output image-level and pixel-level anomaly scores using the logits output and the attention map, respectively, in conjunction with the language output to accomplish anomaly reasoning. To support our training, we present a new dataset comprising 100K images across 400 diverse industrial product categories with extensive attribute-level textual annotations. Experiments indicate IADGPT achieves considerable performance gains in anomaly detection and demonstrates competitiveness in anomaly localization and reasoning. We will release our dataset in camera-ready.

Problem

Research questions and friction points this paper is trying to address.

Enhancing few-shot industrial anomaly detection with LVLMs

Improving localization and reasoning for diverse industrial products

Addressing lack of industrial knowledge in existing LVLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage progressive training strategy

In-context learning-based training paradigm

Image-level and pixel-level anomaly scoring

🔎 Similar Papers

No similar papers found.