🤖 AI Summary
Existing robotic foundation models struggle to generalize across varying viewpoints, manipulator configurations, and end-effectors—particularly parallel-jaw grippers—due to biases in their training data. To address this limitation, this work proposes a Cross-Embodiment Interface (CEI) framework that aligns heterogeneous robot trajectories through functional similarity. By leveraging directional Chamfer distance and gradient-based optimization, CEI synthesizes observations and actions tailored to novel embodiments, enabling bidirectional policy transfer. The approach supports spatial generalization and multimodal motion generation, successfully transferring policies from a Franka Panda to 16 distinct simulated configurations. In real-world experiments, it achieves an average success rate of 82.4% across six tasks when transferring between UR5 robots equipped with AG95 and XHand end-effectors.
📝 Abstract
Robotic foundation models trained on large-scale manipulation datasets have shown promise in learning generalist policies, but they often overfit to specific viewpoints, robot arms, and especially parallel-jaw grippers due to dataset biases. To address this limitation, we propose Cross-Embodiment Interface (CEI), a framework for cross-embodiment learning that enables the transfer of demonstrations across different robot arm and end-effector morphologies. CEI introduces the concept of functional similarity, which is quantified using Directional Chamfer Distance. Then it aligns robot trajectories through gradient-based optimization, followed by synthesizing observations and actions for unseen robot arms and end-effectors. In experiments, CEI transfers data and policies from a Franka Panda robot to 16 different embodiments across 3 tasks in simulation, and supports bidirectional transfer between a UR5+AG95 gripper robot and a UR5+Xhand robot across 6 real-world tasks, achieving an average transfer ratio of 82.4% . Finally, we demonstrate that CEI can also be extended with spatial generalization and multimodal motion generation capabilities using our proposed techniques.