🤖 AI Summary
This work addresses the limited generalizability of existing dexterous manipulation policies across diverse hand morphologies due to reliance on fixed hand structures. To overcome this, the authors propose a parameterized universal representation that unifies the morphology and action spaces of multiple dexterous hands within a structured latent manifold, formulated using standard URDF format and a shared parameter space. Building upon this representation, they employ a variational autoencoder to learn compact semantic embeddings and introduce a generalizable grasping policy training framework enabling zero-shot transfer across hand types. Experiments demonstrate successful zero-shot transfer to unseen hand morphologies—including the three-fingered LEAP hand—in both simulation and real-world settings, achieving a grasping success rate of 81.9%.
📝 Abstract
Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation.