RoTri-Diff: A Spatial Robot-Object Triadic Interaction-Guided Diffusion Model for Bimanual Manipulation

📅 2026-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to dual-arm cooperative manipulation often neglect the dynamic geometric relationships between robotic arms and objects, leading to collisions, unstable grasping, and degraded performance. This work proposes a robot-object triadic interaction (RoTri) representation that explicitly models the spatial ternary geometric constraints among the two arms and the object for the first time. The RoTri representation is integrated into a diffusion-based hierarchical imitation learning framework to jointly optimize key poses and object motion, thereby generating coordinated and stable dual-arm trajectories. The proposed method outperforms the current state-of-the-art by 10.2% across 11 RLBench2 tasks and demonstrates robust execution in four complex real-world dual-arm manipulation tasks.

Technology Category

Application Category

📝 Abstract
Bimanual manipulation is a fundamental robotic skill that requires continuous and precise coordination between two arms. While imitation learning (IL) is the dominant paradigm for acquiring this capability, existing approaches, whether robot-centric or object-centric, often overlook the dynamic geometric relationship among the two arms and the manipulated object. This limitation frequently leads to inter-arm collisions, unstable grasps, and degraded performance in complex tasks. To address this, in this paper we explicitly models the Robot-Object Triadic Interaction (RoTri) representation in bimanual systems, by encoding the relative 6D poses between the two arms and the object to capture their spatial triadic relationship and establish continuous triangular geometric constraints. Building on this, we further introduce RoTri-Diff, a diffusion-based imitation learning framework that combines RoTri constraints with robot keyposes and object motion in a hierarchical diffusion process. This enables the generation of stable, coordinated trajectories and robust execution across different modes of bimanual manipulation. Extensive experiments show that our approach outperforms state-of-the-art baselines by 10.2% on 11 representative RLBench2 tasks and achieves stable performance on 4 challenging real-world bimanual tasks. Project website: https://rotri-diff.github.io/.
Problem

Research questions and friction points this paper is trying to address.

bimanual manipulation
robot-object interaction
geometric constraints
imitation learning
collision avoidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robot-Object Triadic Interaction
Diffusion Model
Bimanual Manipulation
6D Pose
Imitation Learning
🔎 Similar Papers
No similar papers found.