Open-Vocabulary Part-Based Grasping

📅 2024-06-10

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Robots must accurately grasp specific object parts based on natural language instructions to enable human-robot collaboration and tool manipulation. Existing approaches struggle with open-vocabulary part-level grasping. This paper introduces the first open-vocabulary part-level grasping framework, integrating GLIP (open-vocabulary object detection), MaskCLIP (part-aware segmentation), and a geometry-aware 6-DoF grasp pose regression network. We further construct the first manually annotated part-level segmentation dataset (1,014 samples) and a real-world part-grasping dataset. Our system localizes target parts and predicts full 6-DoF grasp poses within 800 ms. Evaluated on 28 household object categories across 360 physical trials, it achieves a grasp success rate of 69.52% and a part localization accuracy of 88.57%, significantly outperforming baseline methods.

Technology Category

Application Category

📝 Abstract

Many robotic applications require to grasp objects not arbitrarily but at a very specific object part. This is especially important for manipulation tasks beyond simple pick-and-place scenarios or in robot-human interactions, such as object handovers. We propose AnyPart, a practical system that combines open-vocabulary object detection, open-vocabulary part segmentation and 6DOF grasp pose prediction to infer a grasp pose on a specific part of an object in 800 milliseconds. We contribute two new datasets for the task of open-vocabulary part-based grasping, a hand-segmented dataset containing 1014 object-part segmentations, and a dataset of real-world scenarios gathered during our robot trials for individual objects and table-clearing tasks. We evaluate AnyPart on a mobile manipulator robot using a set of 28 common household objects over 360 grasping trials. AnyPart is capable of producing successful grasps 69.52 %, when ignoring robot-based grasp failures, AnyPart predicts a grasp location on the correct part 88.57 % of the time.

Problem

Research questions and friction points this paper is trying to address.

Enables robots to grasp user-specified object parts

Uses natural language prompts for open-vocabulary part identification

Unifies object detection, part segmentation and grasp prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework combining object detection

Part segmentation and grasp prediction

Open-vocabulary natural language grasping

🔎 Similar Papers

Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding

2023-01-27arXiv.orgCitations: 12