π€ AI Summary
This work addresses the limited generalization and reliance on post-processing in conventional anomaly localization methods for industrial visual inspection, which often fail to focus on critical regions of interest (ROIs). To overcome these limitations, the authors propose a novel network architecture that integrates generative, reconstructive, and discriminative mechanisms. For the first time, an ROI attention module is embedded within a residual autoencoder-based GAN framework, enabling joint training on both synthetic defective and normal samples to guide the model toward authentic defect regions. The method eliminates the need for post-processing and achieves high-precision anomaly localization on the MVTec AD benchmark and a real-world BFS vial strip dataset from the pharmaceutical industry, significantly enhancing both generalization capability and localization accuracy in complex industrial scenarios.
π Abstract
Anomaly detection is nowadays increasingly used in industrial applications and processes. One of the main fields of the appliance is the visual inspection for surface anomaly detection, which aims to spot regions that deviate from regularity and consequently identify abnormal products. Defect localization is a key task that is usually achieved using a basic comparison between generated image and the original one, implementing some blob analysis or image-editing algorithms in the postprocessing step, which is very biased towards the source dataset, and they are unable to generalize. Furthermore, in industrial applications, the totality of the image is not always interesting but could be one or some regions of interest (ROIs), where only in those areas there are relevant anomalies to be spotted. For these reasons, we propose a new architecture composed by two blocks. The first block is a generative adversarial network (GAN), based on a residual autoencoder (ResAE), to perform reconstruction and denoising processes, while the second block produces image segmentation, spotting defects. This method learns from a dataset composed of good products and generated synthetic defects. The discriminative network is trained using a ROI for each image contained in the training dataset. The network will learn in which area anomalies are relevant. This approach guarantees the reduction of using preprocessing algorithms, formerly developed with blob analysis and image-editing procedures. To test our model, we used challenging MVTec anomaly detection datasets and an industrial large dataset of pharmaceutical BFS strips of vials. This set constitutes a more realistic use case of the aforementioned network.