Learning a Unified Latent Space for Cross-Embodiment Robot Control

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work proposes a morphology-agnostic unified control framework to address the challenge of action transfer across humanoids with diverse morphologies. By constructing a disentangled shared latent space and incorporating tailored similarity metrics based on joint rotations and end-effector positions, the approach leverages contrastive learning to align local motion patterns between humans and various humanoid robots—including single-arm, dual-arm, and legged configurations. Within this latent space, a target-conditioned policy is trained exclusively on human motion data and can be deployed across platforms without fine-tuning. New robot morphologies can be rapidly integrated via lightweight embedding layers, enabling robust, scalable, and adaptation-free cross-morphology action transfer that significantly outperforms existing methods in both accuracy and flexibility.

Technology Category

Application Category

📝 Abstract

We present a scalable framework for cross-embodiment humanoid robot control by learning a shared latent representation that unifies motion across humans and diverse humanoid platforms, including single-arm, dual-arm, and legged humanoid robots. Our method proceeds in two stages: first, we construct a decoupled latent space that captures localized motion patterns across different body parts using contrastive learning, enabling accurate and flexible motion retargeting even across robots with diverse morphologies. To enhance alignment between embodiments, we introduce tailored similarity metrics that combine joint rotation and end-effector positioning for critical segments, such as arms. Then, we train a goal-conditioned control policy directly within this latent space using only human data. Leveraging a conditional variational autoencoder, our policy learns to predict latent space displacements guided by intended goal directions. We show that the trained policy can be directly deployed on multiple robots without any adaptation. Furthermore, our method supports the efficient addition of new robots to the latent space by learning only a lightweight, robot-specific embedding layer. The learned latent policies can also be directly applied to the new robots. Experimental results demonstrate that our approach enables robust, scalable, and embodiment-agnostic robot control across a wide range of humanoid platforms.

Problem

Research questions and friction points this paper is trying to address.

cross-embodiment

humanoid robot control

latent space

motion retargeting

embodiment-agnostic

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified latent space

cross-embodiment control

contrastive learning