IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

πŸ“… 2025-08-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current few-shot industrial anomaly detection (FS-IAD) methods rely on large vision-language models (LVLMs) but lack domain-specific industrial priors and structured reasoning capabilities, limiting their performance relative to human inspectors. To address this, we propose IADGPTβ€”the first unified vision-language model tailored for FS-IAD, enabling joint anomaly detection, pixel-level localization, and causal reasoning. Our approach introduces a novel three-stage progressive training strategy and a context-aware learning paradigm, enabling effective injection of industrial knowledge and zero-shot adaptation to unseen anomaly categories with minimal samples. IADGPT jointly outputs image-level classification and pixel-level segmentation masks, while enhancing interpretability via attention visualization and logits-based decoding. Evaluated on a high-quality industrial dataset comprising 100K images across 400 anomaly classes, IADGPT significantly outperforms state-of-the-art methods in detection accuracy, localization precision, and reasoning validity, demonstrating strong generalization and practical deployability in real-world industrial settings.

Technology Category

Application Category

πŸ“ Abstract
Few-Shot Industrial Anomaly Detection (FS-IAD) has important applications in automating industrial quality inspection. Recently, some FS-IAD methods based on Large Vision-Language Models (LVLMs) have been proposed with some achievements through prompt learning or fine-tuning. However, existing LVLMs focus on general tasks but lack basic industrial knowledge and reasoning capabilities related to FS-IAD, making these methods far from specialized human quality inspectors. To address these challenges, we propose a unified framework, IADGPT, designed to perform FS-IAD in a human-like manner, while also handling associated localization and reasoning tasks, even for diverse and novel industrial products. To this end, we introduce a three-stage progressive training strategy inspired by humans. Specifically, the first two stages gradually guide IADGPT in acquiring fundamental industrial knowledge and discrepancy awareness. In the third stage, we design an in-context learning-based training paradigm, enabling IADGPT to leverage a few-shot image as the exemplars for improved generalization to novel products. In addition, we design a strategy that enables IADGPT to output image-level and pixel-level anomaly scores using the logits output and the attention map, respectively, in conjunction with the language output to accomplish anomaly reasoning. To support our training, we present a new dataset comprising 100K images across 400 diverse industrial product categories with extensive attribute-level textual annotations. Experiments indicate IADGPT achieves considerable performance gains in anomaly detection and demonstrates competitiveness in anomaly localization and reasoning. We will release our dataset in camera-ready.
Problem

Research questions and friction points this paper is trying to address.

Enhancing few-shot industrial anomaly detection with LVLMs
Improving localization and reasoning for diverse industrial products
Addressing lack of industrial knowledge in existing LVLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage progressive training strategy
In-context learning-based training paradigm
Image-level and pixel-level anomaly scoring
πŸ”Ž Similar Papers
No similar papers found.