🤖 AI Summary
In real-world robotic grasping scenarios, sparse-view point clouds often lead to incompleteness, causing failure in 6D grasp pose estimation. Method: This paper proposes the first Transformer-based encoder-decoder network for point cloud completion tailored to robotic grasping tasks. Its core innovation is the Offset-Attention layer, which jointly models object rigidity (pose invariance) and point-set permutation invariance to preserve geometric consistency. The framework enables end-to-end joint optimization of point cloud completion and 6D grasp pose estimation. Results: On multi-category incomplete point clouds, our method achieves state-of-the-art completion accuracy. When deployed on a physical robot platform, it significantly improves grasp success rates, demonstrating both effectiveness and practical applicability.
📝 Abstract
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset are available at: https://github.com/NunoDuarte/3DSGrasp.