🤖 AI Summary
Existing vision-based anomaly detection (VAD) models enable unsupervised localization of anomalous regions but lack semantically interpretable explanations. To address this, we propose CONVAD—the first concept bottleneck model (CBM) tailored for VAD. CONVAD introduces a dedicated concept dataset, enhances the CBM architecture to jointly produce pixel-level anomaly heatmaps and human-readable semantic concept descriptions (e.g., “scratch”, “stain”), and incorporates a controllable synthetic anomaly generation pipeline to mitigate the scarcity of real-world anomalous samples. The method integrates self-supervised representation learning, attention-guided concept extraction, and dual-path explanation generation. Experiments demonstrate that CONVAD achieves detection performance on par with state-of-the-art VAD methods while uniquely delivering *dual interpretability*: precise spatial localization *and* semantically grounded, human-understandable concept explanations. This advances system trustworthiness and human-AI collaboration efficiency in industrial inspection and safety-critical applications.
📝 Abstract
In recent years, Visual Anomaly Detection (VAD) has gained significant attention due to its ability to identify anomalous images using only normal images during training. Many VAD models work without supervision but are still able to provide visual explanations by highlighting the anomalous regions within an image. However, although these visual explanations can be helpful, they lack a direct and semantically meaningful interpretation for users. To address this limitation, we propose extending Concept Bottleneck Models (CBMs) to the VAD setting. By learning meaningful concepts, the network can provide human-interpretable descriptions of anomalies, offering a novel and more insightful way to explain them. Our contributions are threefold: (i) we develop a Concept Dataset to support research on CBMs for VAD; (ii) we improve the CBM architecture to generate both concept-based and visual explanations, bridging semantic and localization interpretability; and (iii) we introduce a pipeline for synthesizing artificial anomalies, preserving the VAD paradigm of minimizing dependence on rare anomalous samples. Our approach, Concept-Aware Visual Anomaly Detection (CONVAD), achieves performance comparable to classic VAD methods while providing richer, concept-driven explanations that enhance interpretability and trust in VAD systems.