T(R,O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dexterous grasping has long suffered from poor generalization, low efficiency, and limited cross-manipulator adaptability due to the difficulty of modeling high-dimensional state-action spaces. To address this, we propose the first graph diffusion model-based framework for cross-manipulator grasp generation. Our method introduces a novel T(R,O) graph representation that unifies hand-object spatial transformations and geometric relationships. By tightly coupling graph neural networks with a lightweight inverse kinematics solver, the framework supports both unconditional and conditional grasp generation. It enables efficient parallel inference and closed-loop manipulation. Evaluated across multiple dexterous manipulators, our approach achieves an average success rate of 94.83% with a single-inference latency of only 0.21 seconds (throughput: 41 Hz), significantly outperforming state-of-the-art methods. It further demonstrates strong generalization, low memory footprint (<1.2 GB GPU memory), and hardware deployability—making it suitable for real-time robotic applications.

Technology Category

Application Category

📝 Abstract
Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T(R,O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T(R,O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T(R,O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. More importantly, the high inference speed enables closed-loop dexterous manipulation, underscoring the potential of T(R,O) Grasp to scale into a foundation model for dexterous grasping.
Problem

Research questions and friction points this paper is trying to address.

Efficiently generates diverse grasps across multiple robotic hands
Models spatial transformations between robotic hands and objects
Enables closed-loop dexterous manipulation with high inference speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based framework generates cross-embodiment dexterous grasps
T(R,O) Graph models robot-object spatial transformations and geometry
Graph diffusion with IK solver enables fast conditioned grasp synthesis
Xin Fei
Xin Fei
National University of Singapore
Robotic ManipulationComputer Vision
Zhixuan Xu
Zhixuan Xu
National University of Singapore
Robotics
H
Huaicong Fang
College of Control Science and Engineering, Zhejiang University
T
Tianrui Zhang
Department of Computer Science, National University of Singapore
L
Lin Shao
Department of Computer Science, National University of Singapore