🤖 AI Summary
In automotive manufacturing, visual quality inspection heavily relies on large-scale manually annotated real-world data, incurring prohibitive labeling costs. To address this, we propose a novel unsupervised image classification paradigm: leveraging DINOv2 to extract robust visual features, constructing a feature database from high-fidelity synthetically generated reference images, and performing defect identification via cosine similarity–based nearest-neighbor retrieval. Crucially, our approach is the first to substitute real annotated samples entirely with high-quality synthetic data, eliminating dependence on extensive real-world annotations. The resulting end-to-end inspection framework achieves production-grade performance across eight real-world assembly line tasks, matching the classification accuracy of fully supervised models while drastically reducing data acquisition and annotation overhead. This work establishes a scalable, cost-effective pathway for deploying industrial visual inspection systems.
📝 Abstract
Visual quality inspection in automotive production is essential for ensuring the safety and reliability of vehicles. Computer vision (CV) has become a popular solution for these inspections due to its cost-effectiveness and reliability. However, CV models require large, annotated datasets, which are costly and time-consuming to collect. To reduce the need for extensive training data, we propose a novel image classification pipeline that combines similarity search using a vision-based foundation model with synthetic data. Our approach leverages a DINOv2 model to transform input images into feature vectors, which are then compared to pre-classified reference images using cosine distance measurements. By utilizing synthetic data instead of real images as references, our pipeline achieves high classification accuracy without relying on real data. We evaluate this approach in eight real-world inspection scenarios and demonstrate that it meets the high performance requirements of production environments.