🤖 AI Summary
Existing interpretable anomaly detection methods struggle to distinguish fine-grained anomaly types (e.g., “crack” vs. “scratch”) and require object-class-specific model training, incurring high deployment costs; while large vision-language models (VLMs) offer semantic expressiveness, their prohibitive computational overhead hinders real-time or embedded deployment.
Method: We propose a lightweight convolutional framework that, for the first time, generates multi-channel anomaly heatmaps within a single inference pipeline—each channel encoding a distinct anomaly semantics—using only image-level supervision. Vision-language priors guide cross-channel semantic alignment, eliminating class-specific modeling.
Contribution/Results: On Real-IAD, our method matches state-of-the-art performance in joint anomaly localization and classification across object categories and anomaly types, with 67% fewer parameters and 3.2× faster inference—significantly enhancing edge-deployment feasibility.
📝 Abstract
Most explainable anomaly detection methods often identify anomalies but lack the capability to differentiate the type of anomaly. Furthermore, they often require the costly training and maintenance of separate models for each object category. The lack of specificity is a significant research gap, as identifying the type of anomaly (e.g.,"Crack"vs."Scratch") is crucial for accurate diagnosis that facilitates cost-saving operational decisions across diverse application domains. While some recent large-scale Vision-Language Models (VLMs) have begun to address this, they are computationally intensive and memory-heavy, restricting their use in real-time or embedded systems. We propose MultiTypeFCDD, a simple and lightweight convolutional framework designed as a practical alternative for explainable multi-type anomaly detection. MultiTypeFCDD uses only image-level labels to learn and produce multi-channel heatmaps, where each channel is trained to correspond to a specific anomaly type. The model functions as a single, unified framework capable of differentiating anomaly types across multiple object categories, eliminating the need to train and manage separate models for each object category. We evaluated our proposed method on the Real-IAD dataset and it delivers results competitive with state-of-the-art complex models at significantly reduced parametric load and inference times. This makes it a highly practical and viable solution for real-world applications where computational resources are tightly constrained.