UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

📅 2024-12-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Training dexterous robotic grasping systems is complex, exhibits poor generalization, and struggles to scale to large-scale heterogeneous object sets. Method: We propose UniGraspTransformer—a unified dexterous grasping policy network based on the Transformer architecture. It first pretrains object-specific policies via reinforcement learning, then distills them into a single large-scale unified model via behavior cloning, introducing the novel “simplified policy distillation” paradigm that eliminates conventional multi-stage training. The model supports up to 12 self-attention layers and fuses state-based and visual modalities. Results: In vision-driven settings, UniGraspTransformer achieves absolute gains of 3.5%, 7.7%, and 10.1% in grasp success rates on seen, intra-class unseen, and completely unseen objects, respectively—outperforming UniDexGrasp++. These results demonstrate superior cross-object, cross-category, and zero-shot generalization capabilities.

Technology Category

Application Category

📝 Abstract

We introduce UniGraspTransformer, a universal Transformer-based network for dexterous robotic grasping that simplifies training while enhancing scalability and performance. Unlike prior methods such as UniDexGrasp++, which require complex, multi-step training pipelines, UniGraspTransformer follows a streamlined process: first, dedicated policy networks are trained for individual objects using reinforcement learning to generate successful grasp trajectories; then, these trajectories are distilled into a single, universal network. Our approach enables UniGraspTransformer to scale effectively, incorporating up to 12 self-attention blocks for handling thousands of objects with diverse poses. Additionally, it generalizes well to both idealized and real-world inputs, evaluated in state-based and vision-based settings. Notably, UniGraspTransformer generates a broader range of grasping poses for objects in various shapes and orientations, resulting in more diverse grasp strategies. Experimental results demonstrate significant improvements over state-of-the-art, UniDexGrasp++, across various object categories, achieving success rate gains of 3.5%, 7.7%, and 10.1% on seen objects, unseen objects within seen categories, and completely unseen objects, respectively, in the vision-based setting. Project page: https://dexhand.github.io/UniGraspTransformer.

Problem

Research questions and friction points this paper is trying to address.

Simplifies training for dexterous robotic grasping

Enhances scalability with universal Transformer-based network

Improves grasp success rates across diverse object categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based network for robotic grasping

Simplified training with policy distillation

Scales to thousands of diverse objects

🔎 Similar Papers

No similar papers found.

Authors to Follow