CEI: A Unified Interface for Cross-Embodiment Visuomotor Policy Learning in 3D Space

📅 2026-01-14
🏛️ IEEE Robotics and Automation Letters
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing robotic foundation models struggle to generalize across varying viewpoints, manipulator configurations, and end-effectors—particularly parallel-jaw grippers—due to biases in their training data. To address this limitation, this work proposes a Cross-Embodiment Interface (CEI) framework that aligns heterogeneous robot trajectories through functional similarity. By leveraging directional Chamfer distance and gradient-based optimization, CEI synthesizes observations and actions tailored to novel embodiments, enabling bidirectional policy transfer. The approach supports spatial generalization and multimodal motion generation, successfully transferring policies from a Franka Panda to 16 distinct simulated configurations. In real-world experiments, it achieves an average success rate of 82.4% across six tasks when transferring between UR5 robots equipped with AG95 and XHand end-effectors.

Technology Category

Application Category

📝 Abstract
Robotic foundation models trained on large-scale manipulation datasets have shown promise in learning generalist policies, but they often overfit to specific viewpoints, robot arms, and especially parallel-jaw grippers due to dataset biases. To address this limitation, we propose Cross-Embodiment Interface (CEI), a framework for cross-embodiment learning that enables the transfer of demonstrations across different robot arm and end-effector morphologies. CEI introduces the concept of functional similarity, which is quantified using Directional Chamfer Distance. Then it aligns robot trajectories through gradient-based optimization, followed by synthesizing observations and actions for unseen robot arms and end-effectors. In experiments, CEI transfers data and policies from a Franka Panda robot to 16 different embodiments across 3 tasks in simulation, and supports bidirectional transfer between a UR5+AG95 gripper robot and a UR5+Xhand robot across 6 real-world tasks, achieving an average transfer ratio of 82.4% . Finally, we demonstrate that CEI can also be extended with spatial generalization and multimodal motion generation capabilities using our proposed techniques.
Problem

Research questions and friction points this paper is trying to address.

cross-embodiment
visuomotor policy
robotic generalization
embodiment transfer
foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Embodiment Learning
Functional Similarity
Directional Chamfer Distance
Visuomotor Policy Transfer
Robot Generalization
🔎 Similar Papers
No similar papers found.
Tong Wu
Tong Wu
BIGAI, Tsinghua University
Text GenerationDiffusion Language Model
Shoujie Li
Shoujie Li
Tsinghua University
Robot SensingGraspingEmbodied AI
Shoujie Li
Shoujie Li
Tsinghua University
Robot SensingGraspingEmbodied AI
J
Junhao Gong
Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
C
Changqing Guo
Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
X
Xingting Li
Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
S
Shilong Mu
Xspark Ai, Shenzhen 518052, China
Wenbo Ding
Wenbo Ding
UNIVERSITY AT BUFFALO
securityMachine Learning