Scaling Cross-Embodiment World Models for Dexterous Manipulation

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the challenge of generalizing dexterous manipulation across morphologically diverse robots. Methodologically, it introduces a morphology-agnostic unified representation and policy transfer framework: (1) modeling multi-morphology embodiments and actions as 3D particle sets with displacement vectors to yield state-action representations that preserve control semantics while being morphology-invariant; (2) designing a graph-structured world model trained jointly on simulated data, real human hand demonstrations, and exploratory data from heterogeneous robots; and (3) integrating model-predictive control for cross-hardware deployment. To our knowledge, this is the first approach enabling joint training on human hand and robotic hand data. Evaluated on rigid and deformable object manipulation tasks, the framework demonstrates improved generalization with increasing numbers of training morphologies, significantly outperforming morphology-specific baselines and enabling seamless transfer across robots with vastly different degrees of freedom—from low-DOF to high-DOF heterogeneous platforms.

Technology Category

Application Category

📝 Abstract

Cross-embodiment learning seeks to build generalist robots that operate across diverse morphologies, but differences in action spaces and kinematics hinder data sharing and policy transfer. This raises a central question: Is there any invariance that allows actions to transfer across embodiments? We conjecture that environment dynamics are embodiment-invariant, and that world models capturing these dynamics can provide a unified interface across embodiments. To learn such a unified world model, the crucial step is to design state and action representations that abstract away embodiment-specific details while preserving control relevance. To this end, we represent different embodiments (e.g., human hands and robot hands) as sets of 3D particles and define actions as particle displacements, creating a shared representation for heterogeneous data and control problems. A graph-based world model is then trained on exploration data from diverse simulated robot hands and real human hands, and integrated with model-based planning for deployment on novel hardware. Experiments on rigid and deformable manipulation tasks reveal three findings: (i) scaling to more training embodiments improves generalization to unseen ones, (ii) co-training on both simulated and real data outperforms training on either alone, and (iii) the learned models enable effective control on robots with varied degrees of freedom. These results establish world models as a promising interface for cross-embodiment dexterous manipulation.

Problem

Research questions and friction points this paper is trying to address.

Transferring actions across robots with different morphologies and kinematics

Learning embodiment-invariant world models for unified control interface

Developing shared representations for heterogeneous robot and human hand data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Representing embodiments as 3D particle sets

Defining actions as particle displacement operations

Training graph-based world models with cross-embodiment data

🔎 Similar Papers

DexSim2Real2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation

2024-09-13arXiv.orgCitations: 0

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids

2024-07-16Neural Information Processing SystemsCitations: 16

D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

2024-10-02arXiv.orgCitations: 2