🤖 AI Summary
Industrial terminal row detection suffers from scarce real-world annotated data and challenging sim-to-real domain transfer when using synthetic images. Method: This paper proposes a lightweight, efficient synthetic image generation method based on 3D CAD models, and— for the first time—systematically validates the synergistic effect of domain randomization and domain knowledge on generalization in high-density, visually similar component scenarios. Contribution/Results: We develop a low-cost, high-fidelity synthetic pipeline and publicly release a dataset comprising 30,000 synthetic images and 300 real annotated samples. Training solely on synthetic data, the DINO detector achieves 98.40% mAP on the real-world test set—significantly outperforming YOLOv8 and Faster R-CNN. This demonstrates the method’s effectiveness and practicality for complex industrial visual inspection tasks.
📝 Abstract
In industrial manufacturing, deploying deep learning models for visual inspection is mostly hindered by the high and often intractable cost of collecting and annotating large-scale training datasets. While image synthesis from 3D CAD models is a common solution, the individual techniques of domain and rendering randomization to create rich synthetic training datasets have been well studied mainly in simple domains. Hence, their effectiveness on complex industrial tasks with densely arranged and similar objects remains unclear. In this paper, we investigate the sim-to-real generalization performance of standard object detectors on the complex industrial application of terminal strip object detection, carefully combining randomization and domain knowledge. We describe step-by-step the creation of our image synthesis pipeline that achieves high realism with minimal implementation effort and explain how this approach could be transferred to other industrial settings. Moreover, we created a dataset comprising 30,000 synthetic images and 300 manually annotated real images of terminal strips, which is publicly available for reference and future research. To provide a baseline as a lower bound of the expectable performance in these challenging industrial parts detection tasks, we show the sim-to-real generalization performance of standard object detectors on our dataset based on a fully synthetic training. While all considered models behave similarly, the transformer-based DINO model achieves the best score with 98.40% mean average precision on the real test set, demonstrating that our pipeline enables high quality detections in complex industrial environments from existing CAD data and with a manageable image synthesis effort.