Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
To address the generalization bottlenecks of large multimodal models (LMMs) in industrial anomaly detection (IAD)—specifically weak defect visual representation learning and shallow root-cause reasoning—this paper proposes Triad, a novel framework. Methodologically, Triad introduces: (1) a manufacturing-driven anomaly detection paradigm; (2) an expert-guided AnyRes visual tokenizer enabling adaptive tokenization of defective regions; and (3) the InstructIAD instruction-tuning dataset coupled with CoT-M, a chain-of-thought reasoning mechanism grounded in manufacturing process knowledge to enhance attribution capability. Built upon an enhanced LLaVA architecture, Triad further integrates vision token enhancement guided by industrial localization models. Extensive experiments across multiple industrial benchmarks demonstrate significant improvements in both defect localization accuracy and interpretability over state-of-the-art LMM-based approaches. All code, datasets, and pretrained models are publicly released.

Technology Category

Application Category

📝 Abstract
Although recent methods have tried to introduce large multimodal models (LMMs) into industrial anomaly detection (IAD), their generalization in the IAD field is far inferior to that for general purposes. We summarize the main reasons for this gap into two aspects. On one hand, general-purpose LMMs lack cognition of defects in the visual modality, thereby failing to sufficiently focus on defect areas. Therefore, we propose to modify the AnyRes structure of the LLaVA model, providing the potential anomalous areas identified by existing IAD models to the LMMs. On the other hand, existing methods mainly focus on identifying defects by learning defect patterns or comparing with normal samples, yet they fall short of understanding the causes of these defects. Considering that the generation of defects is closely related to the manufacturing process, we propose a manufacturing-driven IAD paradigm. An instruction-tuning dataset for IAD (InstructIAD) and a data organization approach for Chain-of-Thought with manufacturing (CoT-M) are designed to leverage the manufacturing process for IAD. Based on the above two modifications, we present Triad, a novel LMM-based method incorporating an expert-guided region-of-interest tokenizer and manufacturing process for industrial anomaly detection. Extensive experiments show that our Triad not only demonstrates competitive performance against current LMMs but also achieves further improved accuracy when equipped with manufacturing processes. Source code, training data, and pre-trained models will be publicly available at https://github.com/tzjtatata/Triad.
Problem

Research questions and friction points this paper is trying to address.

Improves anomaly detection using expert-guided visual tokenizer.
Integrates manufacturing process insights for defect understanding.
Enhances LMM generalization in industrial anomaly detection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified AnyRes structure for LLaVA model
Manufacturing-driven IAD paradigm introduced
Expert-guided visual tokenizer for anomaly detection