Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models

📅 2025-05-05

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Industrial anomaly detection has long emphasized anomaly localization and segmentation, while fine-grained anomaly classification—distinguishing specific anomaly types—remains underexplored. This paper proposes VELM, a multimodal large language model pipeline that introduces a novel “detection–classification–response” three-stage paradigm: unsupervised visual modules (e.g., PatchCore) first localize anomalous regions; then, a vision-language joint encoder coupled with an LLM performs precise fine-grained classification. To enable this task, we introduce MVTec-AC/VisA-AC—the first benchmark datasets annotated with fine-grained anomaly categories—thereby filling a critical labeling gap. We further design a cross-dataset transfer adaptation mechanism. Experiments demonstrate state-of-the-art performance: 80.4% classification accuracy on MVTec-AD (+5% over prior SOTA) and 84% on MVTec-AC. Our work advances industrial anomaly understanding from binary detection (“anomalous or not”) toward semantic interpretation (“what type of anomaly”).

Technology Category

Application Category

📝 Abstract

Recent advances in visual industrial anomaly detection have demonstrated exceptional performance in identifying and segmenting anomalous regions while maintaining fast inference speeds. However, anomaly classification-distinguishing different types of anomalies-remains largely unexplored despite its critical importance in real-world inspection tasks. To address this gap, we propose VELM, a novel LLM-based pipeline for anomaly classification. Given the critical importance of inference speed, we first apply an unsupervised anomaly detection method as a vision expert to assess the normality of an observation. If an anomaly is detected, the LLM then classifies its type. A key challenge in developing and evaluating anomaly classification models is the lack of precise annotations of anomaly classes in existing datasets. To address this limitation, we introduce MVTec-AC and VisA-AC, refined versions of the widely used MVTec-AD and VisA datasets, which include accurate anomaly class labels for rigorous evaluation. Our approach achieves a state-of-the-art anomaly classification accuracy of 80.4% on MVTec-AD, exceeding the prior baselines by 5%, and 84% on MVTec-AC, demonstrating the effectiveness of VELM in understanding and categorizing anomalies. We hope our methodology and benchmark inspire further research in anomaly classification, helping bridge the gap between detection and comprehensive anomaly characterization.

Problem

Research questions and friction points this paper is trying to address.

Classifying industrial anomalies using multi-modal LLMs

Addressing lack of anomaly class labels in datasets

Improving anomaly classification accuracy beyond detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised anomaly detection for initial assessment

LLM-based pipeline for anomaly type classification

Enhanced datasets with precise anomaly class labels

🔎 Similar Papers

Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning