🤖 AI Summary
Addressing the challenge of simultaneously ensuring safety, generalization, and long-horizon control in robotic dexterous manipulation (e.g., tool use), this work proposes a human-robot collaborative hierarchical control paradigm. First, large-scale dexterous motion primitives are pre-trained via reinforcement learning; second, a promptable dexterous foundation controller is developed—marking the first unified framework that interprets human teleoperation as high-level semantic commands while autonomously generating precise low-level motions. The method integrates real teleoperation data-driven modeling, sim-to-real transfer, and a prompt-driven hierarchical control architecture. Evaluations in simulation and on physical platforms demonstrate a 10–100× improvement in object grasp stability, successful execution of multi-object reorientation, and the first robotic dexterous use of complex tools—including pens, syringes, and screwdrivers—without task-specific reward engineering or reliance on tactile feedback.
📝 Abstract
Teaching robots dexterous manipulation skills, such as tool use, presents a significant challenge. Current approaches can be broadly categorized into two strategies: human teleoperation (for imitation learning) and sim-to-real reinforcement learning. The first approach is difficult as it is hard for humans to produce safe and dexterous motions on a different embodiment without touch feedback. The second RL-based approach struggles with the domain gap and involves highly task-specific reward engineering on complex tasks. Our key insight is that RL is effective at learning low-level motion primitives, while humans excel at providing coarse motion commands for complex, long-horizon tasks. Therefore, the optimal solution might be a combination of both approaches. In this paper, we introduce DexterityGen (DexGen), which uses RL to pretrain large-scale dexterous motion primitives, such as in-hand rotation or translation. We then leverage this learned dataset to train a dexterous foundational controller. In the real world, we use human teleoperation as a prompt to the controller to produce highly dexterous behavior. We evaluate the effectiveness of DexGen in both simulation and real world, demonstrating that it is a general-purpose controller that can realize input dexterous manipulation commands and significantly improves stability by 10-100x measured as duration of holding objects across diverse tasks. Notably, with DexGen we demonstrate unprecedented dexterous skills including diverse object reorientation and dexterous tool use such as pen, syringe, and screwdriver for the first time.