RGBSQGrasp: Inferring Local Superquadric Primitives from Single RGB Image for Graspability-Aware Bin Picking

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In unstructured bin-picking, monocular RGB-based local geometric modeling suffers from poor accuracy and limited generalization due to the absence of depth sensors and CAD models. Method: This paper proposes a superquadric (SQ)-driven grasping framework that requires neither depth sensors nor CAD priors. It first estimates a dense point cloud from a single RGB image using foundation models; then introduces a global-local collaborative SQ fitting network to infer physically interpretable local geometric primitives; finally, employs an SQ-guided grasping sampling strategy to generate stable, feasible 6D grasps from a single viewpoint. The method integrates metric depth estimation, cross-platform synthetic data generation, and end-to-end optimization. Results: Evaluated on real robotic hardware, the approach achieves a 92% grasping success rate and demonstrates significantly improved robustness and generalization to unknown shapes, severe occlusions, and textureless objects.

Technology Category

Application Category

📝 Abstract
Bin picking is a challenging robotic task due to occlusions and physical constraints that limit visual information for object recognition and grasping. Existing approaches often rely on known CAD models or prior object geometries, restricting generalization to novel or unknown objects. Other methods directly regress grasp poses from RGB-D data without object priors, but the inherent noise in depth sensing and the lack of object understanding make grasp synthesis and evaluation more difficult. Superquadrics (SQ) offer a compact, interpretable shape representation that captures the physical and graspability understanding of objects. However, recovering them from limited viewpoints is challenging, as existing methods rely on multiple perspectives for near-complete point cloud reconstruction, limiting their effectiveness in bin-picking. To address these challenges, we propose extbf{RGBSQGrasp}, a grasping framework that leverages superquadric shape primitives and foundation metric depth estimation models to infer grasp poses from a monocular RGB camera -- eliminating the need for depth sensors. Our framework integrates a universal, cross-platform dataset generation pipeline, a foundation model-based object point cloud estimation module, a global-local superquadric fitting network, and an SQ-guided grasp pose sampling module. By integrating these components, RGBSQGrasp reliably infers grasp poses through geometric reasoning, enhancing grasp stability and adaptability to unseen objects. Real-world robotic experiments demonstrate a 92% grasp success rate, highlighting the effectiveness of RGBSQGrasp in packed bin-picking environments.
Problem

Research questions and friction points this paper is trying to address.

Infer grasp poses from single RGB image for bin picking.
Overcome limitations of depth sensors and unknown object geometries.
Enhance grasp stability and adaptability using superquadric shape primitives.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular RGB camera for grasp pose inference
Superquadric shape primitives for object representation
Foundation model-based point cloud estimation
🔎 Similar Papers
No similar papers found.
Y
Yifeng Xu
University of Michigan, Ann Arbor, MI, 48109, USA
Fan Zhu
Fan Zhu
Bayanat
3D computer visiondeep learning
Y
Ye Li
University of Michigan, Ann Arbor, MI, 48109, USA
S
Sebastian Ren
University College London, London, WC1E 6BT, UK
X
Xiaonan Huang
University of Michigan, Ann Arbor, MI, 48109, USA
Y
Yuhao Chen
University of Waterloo, Waterloo, ON, N2L 3G1, Canada