XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing 6D pose datasets predominantly target domestic scenes and fail to capture realistic industrial bin-picking challenges—such as metallic reflectivity, severe occlusion, high-density clutter, and symmetric, textureless objects. To address this gap, we propose XYZ-IBD, the first high-fidelity, industrial-grade 6D pose benchmark explicitly designed for bin-picking. It comprises 75 multi-view real-world scenes and large-scale photorealistic synthetic data. Our method systematically models industrial complexity via anti-reflective surface coating, multi-view RGB-D fusion, Sim2Real rendering, and a semi-automatic annotation pipeline—achieving sub-millimeter ground-truth accuracy. Benchmarking reveals substantial performance degradation of current state-of-the-art methods on XYZ-IBD, underscoring its rigor and realism. We publicly release the dataset, annotation tools, and a comprehensive evaluation protocol to advance robust, deployable 6D perception research for industrial automation.

Technology Category

Application Category

📝 Abstract

We introduce XYZ-IBD, a bin-picking dataset for 6D pose estimation that captures real-world industrial complexity, including challenging object geometries, reflective materials, severe occlusions, and dense clutter. The dataset reflects authentic robotic manipulation scenarios with millimeter-accurate annotations. Unlike existing datasets that primarily focus on household objects, which approach saturation,XYZ-IBD represents the unsolved realistic industrial conditions. The dataset features 15 texture-less, metallic, and mostly symmetrical objects of varying shapes and sizes. These objects are heavily occluded and randomly arranged in bins with high density, replicating the challenges of real-world bin-picking. XYZ-IBD was collected using two high-precision industrial cameras and one commercially available camera, providing RGB, grayscale, and depth images. It contains 75 multi-view real-world scenes, along with a large-scale synthetic dataset rendered under simulated bin-picking conditions. We employ a meticulous annotation pipeline that includes anti-reflection spray, multi-view depth fusion, and semi-automatic annotation, achieving millimeter-level pose labeling accuracy required for industrial manipulation. Quantification in simulated environments confirms the reliability of the ground-truth annotations. We benchmark state-of-the-art methods on 2D detection, 6D pose estimation, and depth estimation tasks on our dataset, revealing significant performance degradation in our setups compared to current academic household benchmarks. By capturing the complexity of real-world bin-picking scenarios, XYZ-IBD introduces more realistic and challenging problems for future research. The dataset and benchmark are publicly available at https://xyz-ibd.github.io/XYZ-IBD/.

Problem

Research questions and friction points this paper is trying to address.

Addresses 6D pose estimation in complex industrial bin-picking scenarios

Overcomes challenges like occlusions, reflective materials, and dense clutter

Provides high-precision dataset for realistic robotic manipulation research

Innovation

Methods, ideas, or system contributions that make the work stand out.

High-precision industrial cameras for data capture

Multi-view depth fusion for accurate annotations

Synthetic dataset under simulated conditions

🔎 Similar Papers

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation