🤖 AI Summary
Generalizable grasping with multi-finger dexterous hands on diverse objects remains challenging, and conventional reinforcement learning (RL) suffers from high-dimensional action spaces and long-horizon exploration, necessitating complex reward shaping or curriculum design. Method: We propose a general grasping method based on single-success demonstration trajectory editing. By modeling trajectory adaptation as a one-step Markov decision process (MDP), we jointly optimize a universal policy across >100 object categories in simulation via visual imitation learning followed by lightweight RL fine-tuning. Our approach requires only binary success rewards and collision penalties, supports RGB-D input, and is extensible to language-guided grasping. Results: Our method achieves 95% success rate on DexGraspNet, 84.6% average success across six unseen dataset categories, and successfully grasps 110 previously unseen real-world objects—including small, thin, and fragile items—demonstrating significantly reduced reward engineering complexity and superior cross-object generalization.
📝 Abstract
Universal grasping with multi-fingered dexterous hands is a fundamental challenge in robotic manipulation. While recent approaches successfully learn closed-loop grasping policies using reinforcement learning (RL), the inherent difficulty of high-dimensional, long-horizon exploration necessitates complex reward and curriculum design, often resulting in suboptimal solutions across diverse objects. We propose DemoGrasp, a simple yet effective method for learning universal dexterous grasping. We start from a single successful demonstration trajectory of grasping a specific object and adapt to novel objects and poses by editing the robot actions in this trajectory: changing the wrist pose determines where to grasp, and changing the hand joint angles determines how to grasp. We formulate this trajectory editing as a single-step Markov Decision Process (MDP) and use RL to optimize a universal policy across hundreds of objects in parallel in simulation, with a simple reward consisting of a binary success term and a robot-table collision penalty. In simulation, DemoGrasp achieves a 95% success rate on DexGraspNet objects using the Shadow Hand, outperforming previous state-of-the-art methods. It also shows strong transferability, achieving an average success rate of 84.6% across diverse dexterous hand embodiments on six unseen object datasets, while being trained on only 175 objects. Through vision-based imitation learning, our policy successfully grasps 110 unseen real-world objects, including small, thin items. It generalizes to spatial, background, and lighting changes, supports both RGB and depth inputs, and extends to language-guided grasping in cluttered scenes.