🤖 AI Summary
Single-view point clouds suffer from geometric incompleteness, leading to low 6-DoF grasp pose estimation accuracy. Method: This paper proposes a grasp framework integrating point cloud completion and executability modeling. It explicitly encodes completed point clouds as geometric features fed into the grasp network to enforce learning of complete-shape priors. Additionally, it introduces a learnable grasp quality scoring mechanism coupled with an adaptive threshold filtering module to bridge the sim-to-real gap. The framework integrates state-of-the-art completion networks (PCN/MPRNet), grasp prediction networks (PointNet++/PAConv), and a score-based filtering mechanism. Results: Evaluated on a real robotic platform, our method achieves a 17.8% absolute improvement in grasp success rate over prior art, demonstrating superior performance against SOTA methods. It exhibits strong robustness and generalization across arbitrary camera viewpoints.
📝 Abstract
The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to judge the shape of the target object, resulting in low grasping accuracy. Humans can accurately grasp objects from a single view by leveraging their geometry experience to estimate object shapes. Inspired by humans, we propose a novel 6-DoF grasping framework that converts the point completion results as object shape features to train the 6-DoF grasp network. Here, point completion can generate approximate complete points from the 2.5D points similar to the human geometry experience, and converting it as shape features is the way to utilize it to improve grasp efficiency. Furthermore, due to the gap between the network generation and actual execution, we integrate a score filter into our framework to select more executable grasp proposals for the real robot. This enables our method to maintain a high grasp quality in any camera viewpoint. Extensive experiments demonstrate that utilizing complete point features enables the generation of significantly more accurate grasp proposals and the inclusion of a score filter greatly enhances the credibility of real-world robot grasping. Our method achieves a 17.8% success rate higher than the state-of-the-art method in real-world experiments.