🤖 AI Summary
Foundation models (FMs) exhibit strong performance on public benchmarks but show questionable generalization to real-world industrial inspection images, raising concerns about their practical applicability in defect identification.
Method: We systematically evaluate state-of-the-art vision FMs under zero-shot classification and text-prompted paradigms, using both public benchmarks and a newly constructed industrial defect dataset comprising authentic production-line imagery.
Contribution/Results: Experiments reveal that while all models achieve high accuracy on public benchmarks, they consistently fail on real industrial images—exposing a critical cross-domain generalization bottleneck. This challenges the “plug-and-play” assumption for FM deployment in industrial vision and empirically uncovers the fundamental domain gap between academic benchmarks and factory-floor conditions. Our work provides the first large-scale, real-world industrial defect benchmark and delivers key empirical evidence to guide future research on domain-adaptive foundation model tuning for industrial visual inspection.
📝 Abstract
Foundation Models (FMs) have shown impressive performance on various text and image processing tasks. They can generalize across domains and datasets in a zero-shot setting. This could make them suitable for automated quality inspection during series manufacturing, where various types of images are being evaluated for many different products. Replacing tedious labeling tasks with a simple text prompt to describe anomalies and utilizing the same models across many products would save significant efforts during model setup and implementation. This is a strong advantage over supervised Artificial Intelligence (AI) models, which are trained for individual applications and require labeled training data. We test multiple recent FMs on both custom real-world industrial image data and public image data. We show that all of those models fail on our real-world data, while the very same models perform well on public benchmark datasets.