Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation

📅 2024-12-11

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

182K/year

🤖 AI Summary

To address the longstanding challenge of lacking fine-grained contact-annotated multi-hand grasping datasets and a unified modeling framework for semantic multi-arm grasping generation, this work introduces Multi-GraspSet—the first large-scale, contact-annotated multi-hand grasping dataset—and proposes the first language-guided end-to-end large model for multi-hand grasping generation. The model integrates point-cloud encoding, text-geometry cross-modal alignment, semantic space mapping, hand-aware linear decoding, and large language model–based sequence modeling. It supports grasp pose generation under diverse conditions, including heterogeneous hand types, multiple objects, and varied natural-language instructions. Extensive evaluations in both simulation and real-robot settings demonstrate substantial performance gains over prior methods, achieving unprecedented accuracy and strong generalization across unseen objects, hand configurations, and task descriptions.

Technology Category

Application Category

📝 Abstract

Multi-hand semantic grasp generation aims to generate feasible and semantically appropriate grasp poses for different robotic hands based on natural language instructions. Although the task is highly valuable, due to the lack of multihand grasp datasets with fine-grained contact description between robotic hands and objects, it is still a long-standing difficult task. In this paper, we present Multi-GraspSet, the first large-scale multi-hand grasp dataset with automatically contact annotations. Based on Multi-GraspSet, we propose Multi-GraspLLM, a unified language-guided grasp generation framework, which leverages large language models (LLM) to handle variable-length sequences, generating grasp poses for diverse robotic hands in a single unified architecture. Multi-GraspLLM first aligns the encoded point cloud features and text features into a unified semantic space. It then generates grasp bin tokens that are subsequently converted into grasp pose for each robotic hand via hand-aware linear mapping. The experimental results demonstrate that our approach significantly outperforms existing methods in both real-world experiments and simulator. More information can be found on our project page https://multi-graspllm.github.io.

Problem

Research questions and friction points this paper is trying to address.

Generate feasible grasp poses for robotic hands using natural language instructions.

Address lack of multi-hand grasp datasets with detailed contact descriptions.

Propose a unified framework leveraging LLMs for diverse robotic hand grasp generation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large language models for grasp generation

Uses unified semantic space for feature alignment

Generates grasp poses via hand-aware linear mapping

🔎 Similar Papers

No similar papers found.