Pickalo: Leveraging 6D Pose Estimation for Low-Cost Industrial Bin Picking

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of severe occlusion, cluttered stacking, and expensive 3D sensing in industrial bin picking by proposing a modular 6D pose-based grasping system built on low-cost hardware—specifically, a UR5e robotic arm and a RealSense D435i sensor. The system integrates multi-view active exploration using a wrist-mounted RGB-D camera, Mask R-CNN instance segmentation trained on synthetic data, zero-shot 6D pose estimation via SAM-6D, and a novel pose buffering mechanism that fuses temporal observations, handles symmetric objects, and suppresses noise. It further incorporates BridgeDepth for enhanced depth estimation and utility-based rapid collision checking to generate antipodal grasps. Experiments demonstrate that the system achieves up to 600 picks per hour with a success rate of 96–99%, maintaining high robustness during 30 minutes of intensive operation—the first demonstration of industrial-grade throughput and stability on a purely low-cost platform.
📝 Abstract
Bin picking in real industrial environments remains challenging due to severe clutter, occlusions, and the high cost of traditional 3D sensing setups. We present Pickalo, a modular 6D pose-based bin-picking pipeline built entirely on low-cost hardware. A wrist-mounted RGB-D camera actively explores the scene from multiple viewpoints, while raw stereo streams are processed with BridgeDepth to obtain refined depth maps suitable for accurate collision reasoning. Object instances are segmented with a Mask-RCNN model trained purely on photorealistic synthetic data and localized using the zero-shot SAM-6D pose estimator. A pose buffer module fuses multi-view observations over time, handling object symmetries and significantly reducing pose noise. Offline, we generate and curate large sets of antipodal grasp candidates per object; online, a utility-based ranking and fast collision checking are queried for the grasp planning. Deployed on a UR5e with a parallel-jaw gripper and an Intel RealSense D435i, Pickalo achieves up to 600 mean picks per hour with 96-99% grasp success and robust performance over 30-minute runs on densely filled euroboxes. Ablation studies demonstrate the benefits of enhanced depth estimation and of the pose buffer for long-term stability and throughput in realistic industrial conditions. Videos are available at https://mesh-iit.github.io/project-jl2-camozzi/
Problem

Research questions and friction points this paper is trying to address.

bin picking
6D pose estimation
low-cost sensing
industrial automation
occlusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

6D pose estimation
low-cost bin picking
synthetic-to-real transfer
multi-view pose fusion
antipodal grasp planning
🔎 Similar Papers
No similar papers found.
A
Alessandro Tarsi
Institut des Systèmes Intelligents et de Robotique (ISIR), Paris 75005, France
M
Matteo Mastrogiuseppe
Generative Bionics, Genova 16152, Italy
S
Saverio Taliani
Generative Bionics, Genova 16152, Italy
S
Simone Cortinovis
Generative Bionics, Genova 16152, Italy
Ugo Pattacini
Ugo Pattacini
Technologist, Istituto Italiano di Tecnologia
Humanoid RoboticsControl EngineeringOptimizationReal-Time SystemsDigital Engineering