🤖 AI Summary
This work addresses the challenge of deploying high-performance anomaly detection models in industrial quality inspection, where defective samples are scarce and edge devices face stringent resource constraints. For the first time, the self-supervised anomaly detection method GLASS is lightweighted and deployed directly onto the Sony IMX500 intelligent vision sensor, enabling on-chip real-time inference using only defect-free training data. By replacing WideResNet-50 with ResNet-18 and applying model compression, INT8 quantization, and memory optimization via the Sony Model Compression Toolkit, the model achieves an 8.7× reduction in parameter count while attaining 94.2% image-level AUROC on MVTec-AD. The optimized model runs at 20 FPS with a per-inference energy consumption of merely 4.0 mJ, yielding an energy efficiency of 470 GMAC/J, and demonstrates robustness even when trained on noisy data.
📝 Abstract
Anomaly detection plays a key role in industrial quality control, where defects must be identified despite the scarcity of labeled faulty samples. Recent self-supervised approaches, such as GLASS, learn normal visual patterns using only defect-free data and have shown strong performance on industrial benchmarks. However, their computational requirements limit deployment on resource-constrained edge platforms.
This work introduces TinyGLASS, a lightweight adaptation of the GLASS framework designed for real-time in-sensor anomaly detection on the Sony IMX500 intelligent vision sensor. The proposed architecture replaces the original WideResNet-50 backbone with a compact ResNet-18 and introduces deployment-oriented modifications that enable static graph tracing and INT8 quantization using Sony's Model Compression Toolkit.
In addition to evaluating performance on the MVTec-AD benchmark, we investigate robustness to contaminated training data and introduce a custom industrial dataset, named MMS Dataset, for cross-device evaluation. Experimental results show that TinyGLASS achieves 8.7x parameter compression while maintaining competitive detection performance, reaching 94.2% image-level AUROC on MVTec-AD and operating at 20 FPS within the 8 MB memory constraints of the IMX500 platform.
System profiling demonstrates low power consumption (4.0 mJ per inference), real-time end-to-end latency (20 FPS), and high energy efficiency (470 GMAC/J). Furthermore, the model maintains stable performance under moderate levels of training data contamination.