🤖 AI Summary
Dexterous grasp generation faces the fundamental challenge of simultaneously ensuring stability, task adaptability, and cross-object generalization. To address this, we propose a novel transfer framework based on conditional diffusion models. Our approach introduces— for the first time—the tripartite object-centric graph representation (comprising contact, part, and orientation graphs) coupled with a dual-mapping mechanism, enabling a cascaded conditional diffusion architecture that enforces multi-graph consistency and fine-grained grasp transfer grounded in geometric relational modeling. Shape-template guidance and a robust grasp recovery module further facilitate high-fidelity grasp synthesis for unseen objects and diverse manipulation tasks. Extensive experiments demonstrate that our method significantly outperforms existing analytical and generative approaches in grasp quality, generation efficiency, and cross-object–cross-task generalization, thereby unifying stability, adaptability, and generalizability within a single framework.
📝 Abstract
Dexterous grasp generation is a fundamental challenge in robotics, requiring both grasp stability and adaptability across diverse objects and tasks. Analytical methods ensure stable grasps but are inefficient and lack task adaptability, while generative approaches improve efficiency and task integration but generalize poorly to unseen objects and tasks due to data limitations. In this paper, we propose a transfer-based framework for dexterous grasp generation, leveraging a conditional diffusion model to transfer high-quality grasps from shape templates to novel objects within the same category. Specifically, we reformulate the grasp transfer problem as the generation of an object contact map, incorporating object shape similarity and task specifications into the diffusion process. To handle complex shape variations, we introduce a dual mapping mechanism, capturing intricate geometric relationship between shape templates and novel objects. Beyond the contact map, we derive two additional object-centric maps, the part map and direction map, to encode finer contact details for more stable grasps. We then develop a cascaded conditional diffusion model framework to jointly transfer these three maps, ensuring their intra-consistency. Finally, we introduce a robust grasp recovery mechanism, identifying reliable contact points and optimizing grasp configurations efficiently. Extensive experiments demonstrate the superiority of our proposed method. Our approach effectively balances grasp quality, generation efficiency, and generalization performance across various tasks. Project homepage: https://cmtdiffusion.github.io/