🤖 AI Summary
This work addresses the challenge of policy generalization for multi-fingered dexterous hands operating under high degrees of freedom and complex contact conditions. The authors propose DexRep, a novel representation that, for the first time, integrates object surface geometry with hand-object spatial relationships to construct an interaction representation that is both geometrically aware and relationally explicit. Leveraging this representation within a deep reinforcement learning framework, the trained policy achieves a grasp success rate of 87.9% on over 5,000 previously unseen objects, despite being trained on only 40 objects. Furthermore, the approach demonstrates significant performance gains—20% to 40% improvements—in tasks involving grasp retargeting and bimanual object transfer, and exhibits strong real-world deployability with minimal sim-to-real gap.
📝 Abstract
Robotic dexterous manipulation is a challenging problem due to high degrees of freedom (DoFs) and complex contacts of multi-fingered robotic hands. Many existing deep reinforcement learning (DRL) based methods aim at improving sample efficiency in high-dimensional output action spaces. However, existing works often overlook the role of representations in achieving generalization of a manipulation policy in the complex input space during the hand-object interaction. In this paper, we propose DexRep, a novel hand-object interaction representation to capture object surface features and spatial relations between hands and objects for dexterous manipulation skill learning. Based on DexRep, policies are learned for three dexterous manipulation tasks, i.e. grasping, in-hand reorientation, bimanual handover, and extensive experiments are conducted to verify the effectiveness. In simulation, for grasping, the policy learned with 40 objects achieves a success rate of 87.9% on more than 5000 unseen objects of diverse categories, significantly surpassing existing work trained with thousands of objects; for the in-hand reorientation and handover tasks, the policies also boost the success rates and other metrics of existing hand-object representations by 20% to 40%. The grasp policies with DexRep are deployed to the real world under multi-camera and single-camera setups and demonstrate a small sim-to-real gap.