π€ AI Summary
Existing general-purpose multimodal large language models struggle to accurately identify category-specific fine-grained defects in industrial settings, resulting in insufficient detection accuracy and limited interpretability. To address this challenge, this work proposes a knowledge-guided dynamic latent-space reasoning framework that innovatively integrates retrieval-augmented category-specific textual knowledge, an entropy-driven optimizable implicit chain-of-thought token reasoning mechanism, and an information-theoretic dynamic visual patch injection strategy. This approach substantially enhances the modelβs fine-grained understanding of industrial anomalies, outperforming state-of-the-art methods across multiple benchmarks while providing interpretable decision rationales.
π Abstract
Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting both detection accuracy and interpretability. To address these limitations, we propose Reason-IAD, a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection. Reason-IAD comprises two core components. First, a retrieval-augmented knowledge module incorporates category-specific textual descriptions into the model input, enabling context-aware reasoning over domain-specific defects. Second, an entropy-driven latent reasoning mechanism conducts iterative exploration within a compact latent space using optimizable latent think tokens, guided by an entropy-based reward that encourages confident and stable predictions. Furthermore, a dynamic visual injection strategy selectively incorporates the most informative image patches into the latent sequence, directing the reasoning process toward regions critical for anomaly detection. Extensive experimental results demonstrate that Reason-IAD consistently outperforms state-of-the-art methods. The code will be publicly available at https://github.com/chenpeng052/Reason-IAD.