🤖 AI Summary
To address performance degradation in 2D detection of unseen objects under complex backgrounds, low illumination, and cluttered conditions in industrial settings, this paper proposes a standardized, plug-and-play robust detection pipeline. Methodologically, it integrates foundation model–guided open-vocabulary detection, SAM-based instance segmentation, adaptive background suppression, and low-light image enhancement into a synergistic optimization framework—effectively mitigating SAM false positives and alleviating domain shift. The pipeline requires no prior knowledge of target objects and enables zero-shot generalization. Evaluated on the BOP real-world industrial sorting benchmark, it achieves substantial gains in detection accuracy with minimal inference overhead, demonstrating both effectiveness and edge-deployment feasibility. The core contribution is the first lightweight, multi-stage robust detection paradigm specifically designed for unseen industrial objects, achieving balanced optimization of accuracy and efficiency.
📝 Abstract
Accurate 6D pose estimation is essential for robotic manipulation in industrial environments. Existing pipelines typically rely on off-the-shelf object detectors followed by cropping and pose refinement, but their performance degrades under challenging conditions such as clutter, poor lighting, and complex backgrounds, making detection the critical bottleneck. In this work, we introduce a standardized and plug-in pipeline for 2D detection of unseen objects in industrial settings. Based on current SOTA baselines, our approach reduces domain shift and background artifacts through low-light image enhancement and background removal guided by open-vocabulary detection with foundation models. This design suppresses the false positives prevalent in raw SAM outputs, yielding more reliable detections for downstream pose estimation. Extensive experiments on real-world industrial bin-picking benchmarks from BOP demonstrate that our method significantly boosts detection accuracy while incurring negligible inference overhead, showing the effectiveness and practicality of the proposed method.