Latent Policy Steering with Embodiment-Agnostic Pretrained World Models

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

To address the scarcity of real-world demonstration data in visuomotor policy learning, this paper proposes Latent Policy Steering (LPS): it employs optical flow as an embodiment-agnostic action representation to construct a cross-morphology transferable world model, pretrained on heterogeneous multimodal data—including public robot datasets and human manipulation videos. Subsequently, few-shot adaptation is achieved via action-sequence search in the latent space followed by behavior cloning fine-tuning. The core innovation lies in decoupling action representation from physical embodiment, enabling effective transfer of motion priors from non-robotic sources (e.g., human videos). Experiments demonstrate that LPS improves task success rates by over 50% with only 30 real-robot demonstrations and by over 20% with 50 demonstrations—substantially reducing data collection costs while enhancing policy generalization across morphologies and tasks.

Technology Category

Application Category

📝 Abstract

Learning visuomotor policies via imitation has proven effective across a wide range of robotic domains. However, the performance of these policies is heavily dependent on the number of training demonstrations, which requires expensive data collection in the real world. In this work, we aim to reduce data collection efforts when learning visuomotor robot policies by leveraging existing or cost-effective data from a wide range of embodiments, such as public robot datasets and the datasets of humans playing with objects (human data from play). Our approach leverages two key insights. First, we use optic flow as an embodiment-agnostic action representation to train a World Model (WM) across multi-embodiment datasets, and finetune it on a small amount of robot data from the target embodiment. Second, we develop a method, Latent Policy Steering (LPS), to improve the output of a behavior-cloned policy by searching in the latent space of the WM for better action sequences. In real world experiments, we observe significant improvements in the performance of policies trained with a small amount of data (over 50% relative improvement with 30 demonstrations and over 20% relative improvement with 50 demonstrations) by combining the policy with a WM pretrained on two thousand episodes sampled from the existing Open X-embodiment dataset across different robots or a cost-effective human dataset from play.

Problem

Research questions and friction points this paper is trying to address.

Reduce data collection for visuomotor policy learning

Leverage multi-embodiment datasets for pretraining

Improve policy performance with limited demonstrations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses optic flow for embodiment-agnostic action representation

Finetunes World Model on small target embodiment data

Improves policy via latent space search (LPS)

🔎 Similar Papers

No similar papers found.