🤖 AI Summary
Dexterous grasping has long suffered from poor generalization, low efficiency, and limited cross-manipulator adaptability due to the difficulty of modeling high-dimensional state-action spaces. To address this, we propose the first graph diffusion model-based framework for cross-manipulator grasp generation. Our method introduces a novel T(R,O) graph representation that unifies hand-object spatial transformations and geometric relationships. By tightly coupling graph neural networks with a lightweight inverse kinematics solver, the framework supports both unconditional and conditional grasp generation. It enables efficient parallel inference and closed-loop manipulation. Evaluated across multiple dexterous manipulators, our approach achieves an average success rate of 94.83% with a single-inference latency of only 0.21 seconds (throughput: 41 Hz), significantly outperforming state-of-the-art methods. It further demonstrates strong generalization, low memory footprint (<1.2 GB GPU memory), and hardware deployability—making it suitable for real-time robotic applications.
📝 Abstract
Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T(R,O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T(R,O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T(R,O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. More importantly, the high inference speed enables closed-loop dexterous manipulation, underscoring the potential of T(R,O) Grasp to scale into a foundation model for dexterous grasping.