Industrial Synthetic Segment Pre-training

πŸ“… 2025-05-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Industrial instance segmentation suffers from legal restrictions (e.g., ImageNet’s commercial-use prohibition) and severe domain shift, leading to substantial performance degradation of existing vision foundation models (e.g., SAM) in industrial settings. To address this, we propose the first foundation-model paradigm for industrial segmentation that requires no real images, human annotations, or commercial licenses. Our approach synthesizes the InsCore dataset via formula-driven rendering, capturing complex occlusions, densely nested instance masks, and non-rigid deformations. We further introduce Formula-Driven Supervised Learning (FDSL), a novel training framework integrating instance-core mask modeling, unsupervised structural prior embedding, and lightweight ViT fine-tuning. Trained on only 100K synthetic images, our model achieves an average mAP gain of +6.2 points over fine-tuned SAM across five industrial benchmarks, with data efficiency improved by two orders of magnitude relative to SA-1B.

Technology Category

Application Category

πŸ“ Abstract
Pre-training on real-image datasets has been widely proven effective for improving instance segmentation. However, industrial applications face two key challenges: (1) legal and ethical restrictions, such as ImageNet's prohibition of commercial use, and (2) limited transferability due to the domain gap between web images and industrial imagery. Even recent vision foundation models, including the segment anything model (SAM), show notable performance degradation in industrial settings. These challenges raise critical questions: Can we build a vision foundation model for industrial applications without relying on real images or manual annotations? And can such models outperform even fine-tuned SAM on industrial datasets? To address these questions, we propose the Instance Core Segmentation Dataset (InsCore), a synthetic pre-training dataset based on formula-driven supervised learning (FDSL). InsCore generates fully annotated instance segmentation images that reflect key characteristics of industrial data, including complex occlusions, dense hierarchical masks, and diverse non-rigid shapes, distinct from typical web imagery. Unlike previous methods, InsCore requires neither real images nor human annotations. Experiments on five industrial datasets show that models pre-trained with InsCore outperform those trained on COCO and ImageNet-21k, as well as fine-tuned SAM, achieving an average improvement of 6.2 points in instance segmentation performance. This result is achieved using only 100k synthetic images, more than 100 times fewer than the 11 million images in SAM's SA-1B dataset, demonstrating the data efficiency of our approach. These findings position InsCore as a practical and license-free vision foundation model for industrial applications.
Problem

Research questions and friction points this paper is trying to address.

Legal and ethical restrictions on real-image datasets for industrial use
Domain gap between web images and industrial imagery limits transferability
Need for synthetic pre-training without real images or manual annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic dataset InsCore replaces real images
Formula-driven supervised learning generates annotations
Outperforms SAM with 100x fewer images