T(R,O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Dexterous grasping has long suffered from poor generalization, low efficiency, and limited cross-manipulator adaptability due to the difficulty of modeling high-dimensional state-action spaces. To address this, we propose the first graph diffusion model-based framework for cross-manipulator grasp generation. Our method introduces a novel T(R,O) graph representation that unifies hand-object spatial transformations and geometric relationships. By tightly coupling graph neural networks with a lightweight inverse kinematics solver, the framework supports both unconditional and conditional grasp generation. It enables efficient parallel inference and closed-loop manipulation. Evaluated across multiple dexterous manipulators, our approach achieves an average success rate of 94.83% with a single-inference latency of only 0.21 seconds (throughput: 41 Hz), significantly outperforming state-of-the-art methods. It further demonstrates strong generalization, low memory footprint (<1.2 GB GPU memory), and hardware deployability—making it suitable for real-time robotic applications.

Technology Category

Application Category

📝 Abstract

Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T(R,O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T(R,O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T(R,O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. More importantly, the high inference speed enables closed-loop dexterous manipulation, underscoring the potential of T(R,O) Grasp to scale into a foundation model for dexterous grasping.

Problem

Research questions and friction points this paper is trying to address.

Efficiently generates diverse grasps across multiple robotic hands

Models spatial transformations between robotic hands and objects

Enables closed-loop dexterous manipulation with high inference speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based framework generates cross-embodiment dexterous grasps

T(R,O) Graph models robot-object spatial transformations and geometry

Graph diffusion with IK solver enables fast conditioned grasp synthesis

🔎 Similar Papers

D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

2024-10-02arXiv.orgCitations: 2

Learning Cross-hand Policies for High-DOF Reaching and Grasping

2024-04-14arXiv.orgCitations: 2

Nvidia

$168,000-$264,500 USD

Seattle, WA, USA

Research Scientist, Sensor and Systems Robotics (PhD)