Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses critical limitations—such as self-collision and kinematic infeasibility—in end-effector pose prediction (NBP) for bimanual robotic imitation learning. We propose a spatiotemporal graph diffusion strategy that jointly incorporates structural topology and kinematic constraints. Specifically, we construct a dynamic spatiotemporal graph to explicitly model the topological structure and motion coupling between dual-arm joints, and embed differentiable forward/inverse kinematics as a regularization prior within the diffusion process—enabling end-to-end generation of structurally aware and kinematically consistent trajectories. Our method integrates spatiotemporal graph neural networks, denoising diffusion probabilistic models, and joint optimization of bimanual forward/inverse kinematics. Experiments in simulation and on real dual-arm robotic platforms demonstrate substantial improvements in task success rate, guaranteed avoidance of self-collision, strict adherence to joint limits, and superior generalization over existing NBP approaches.

Technology Category

Application Category

📝 Abstract

Despite the significant success of imitation learning in robotic manipulation, its application to bimanual tasks remains highly challenging. Existing approaches mainly learn a policy to predict a distant next-best end-effector pose (NBP) and then compute the corresponding joint rotation angles for motion using inverse kinematics. However, they suffer from two important issues: (1) rarely considering the physical robotic structure, which may cause self-collisions or interferences, and (2) overlooking the kinematics constraint, which may result in the predicted poses not conforming to the actual limitations of the robot joints. In this paper, we propose Kinematics enhanced Spatial-TemporAl gRaph Diffuser (KStar Diffuser). Specifically, (1) to incorporate the physical robot structure information into action prediction, KStar Diffuser maintains a dynamic spatial-temporal graph according to the physical bimanual joint motions at continuous timesteps. This dynamic graph serves as the robot-structure condition for denoising the actions; (2) to make the NBP learning objective consistent with kinematics, we introduce the differentiable kinematics to provide the reference for optimizing KStar Diffuser. This module regularizes the policy to predict more reliable and kinematics-aware next end-effector poses. Experimental results show that our method effectively leverages the physical structural information and generates kinematics-aware actions in both simulation and real-world

Problem

Research questions and friction points this paper is trying to address.

Addresses self-collisions in bimanual robotic manipulation tasks

Incorporates kinematic constraints for realistic joint motion predictions

Enhances imitation learning with physical structure-aware action prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic spatial-temporal graph for robot-structure condition

Differentiable kinematics for reliable pose prediction

Kinematics-aware action generation in bimanual tasks

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Robotic Control Policy (PhD)