Generative AI for Industrial Contour Detection: A Language-Guided Vision System

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Industrial computer vision suffers from insufficient robustness of conventional edge detectors under noise, material variability, and non-ideal imaging conditions. This paper proposes a language-guided generative framework for industrial vision, integrating conditional generative adversarial networks (cGANs) with multimodal vision-language models (GPT-image-1 and Gemini 2.0 Flash) to achieve end-to-end generation and refinement of CAD-level residual contours. We innovatively introduce human-editable standardized prompting and a text-image co-synthesis mechanism, overcoming limitations of handcrafted feature engineering and unimodal modeling. Evaluated on the private FabTrack dataset, our method significantly improves contour fidelity, edge continuity, and geometric alignment accuracy, substantially reducing manual tracing effort. Quantitative and qualitative analyses demonstrate that GPT-image-1 outperforms Gemini 2.0 Flash in structural accuracy and visual quality.

Technology Category

Application Category

📝 Abstract

Industrial computer vision systems often struggle with noise, material variability, and uncontrolled imaging conditions, limiting the effectiveness of classical edge detectors and handcrafted pipelines. In this work, we present a language-guided generative vision system for remnant contour detection in manufacturing, designed to achieve CAD-level precision. The system is organized into three stages: data acquisition and preprocessing, contour generation using a conditional GAN, and multimodal contour refinement through vision-language modeling, where standardized prompts are crafted in a human-in-the-loop process and applied through image-text guided synthesis. On proprietary FabTrack datasets, the proposed system improved contour fidelity, enhancing edge continuity and geometric alignment while reducing manual tracing. For the refinement stage, we benchmarked several vision-language models, including Google's Gemini 2.0 Flash, OpenAI's GPT-image-1 integrated within a VLM-guided workflow, and open-source baselines. Under standardized conditions, GPT-image-1 consistently outperformed Gemini 2.0 Flash in both structural accuracy and perceptual quality. These findings demonstrate the promise of VLM-guided generative workflows for advancing industrial computer vision beyond the limitations of classical pipelines.

Problem

Research questions and friction points this paper is trying to address.

Detecting precise industrial contours under noisy conditions

Overcoming material variability in manufacturing vision systems

Achieving CAD-level precision with generative AI workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional GAN for contour generation

Vision-language modeling for refinement

Human-in-the-loop prompt engineering

🔎 Similar Papers

No similar papers found.