DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robotic policies often exhibit limited generalization to novel behaviors and unseen environments. Method: This paper proposes DreamGen, a four-stage framework that leverages a video world model to generate embodied-consistent synthetic neural trajectories (i.e., robot action videos), followed by latent-action modeling or inverse dynamics modeling to recover high-fidelity pseudo-action labels. It is the first work to adapt image-to-video generation models for embodied agents, establishing a “neural-trajectory-driven generalization” paradigm. Contribution/Results: We introduce DreamGen Bench—the first benchmark explicitly designed for generalization evaluation—and empirically demonstrate a strong positive correlation between video generation quality and downstream policy success rates. Using only teleoperated data from a single task in a single environment, DreamGen achieves zero-shot transfer of 22 novel behaviors across both seen and unseen environments, significantly improving cross-behavior and cross-environment generalization performance.

Technology Category

Application Category

📝 Abstract
We introduce DreamGen, a simple yet highly effective 4-stage pipeline for training robot policies that generalize across behaviors and environments through neural trajectories - synthetic robot data generated from video world models. DreamGen leverages state-of-the-art image-to-video generative models, adapting them to the target robot embodiment to produce photorealistic synthetic videos of familiar or novel tasks in diverse environments. Since these models generate only videos, we recover pseudo-action sequences using either a latent action model or an inverse-dynamics model (IDM). Despite its simplicity, DreamGen unlocks strong behavior and environment generalization: a humanoid robot can perform 22 new behaviors in both seen and unseen environments, while requiring teleoperation data from only a single pick-and-place task in one environment. To evaluate the pipeline systematically, we introduce DreamGen Bench, a video generation benchmark that shows a strong correlation between benchmark performance and downstream policy success. Our work establishes a promising new axis for scaling robot learning well beyond manual data collection.
Problem

Research questions and friction points this paper is trying to address.

Training robot policies for cross-behavior and environment generalization
Generating photorealistic synthetic robot data from video models
Reducing reliance on manual data collection in robot learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

4-stage pipeline for robot policy training
image-to-video generative models adaptation
pseudo-action recovery via latent or IDM
🔎 Similar Papers
No similar papers found.
Joel Jang
Joel Jang
Research Scientist, Nvidia
Seonghyeon Ye
Seonghyeon Ye
KAIST
Machine LearningRobot Learning
Zongyu Lin
Zongyu Lin
UCLA
Large Foundation ModelPretrainingReasoning
Jiannan Xiang
Jiannan Xiang
University of California, San Diego
Natural Language Processing
Johan Bjorck
Johan Bjorck
Cornell University
Computer science
Yu Fang
Yu Fang
Honda Research Institute Japan Co., Ltd.
Human-Robot InteractionEye-head coordinationEye MovementVisual Perception/Cognition
Fengyuan Hu
Fengyuan Hu
Research Engineer, NVIDIA
RoboticsAIMLNLPCognitive science
S
Spencer Huang
NVIDIA
K
Kaushil Kundalia
NVIDIA
Y
Yen-Chen Lin
NVIDIA
L
Loic Magne
NVIDIA
Ajay Mandlekar
Ajay Mandlekar
Research Scientist, NVIDIA
Robot LearningRoboticsMachine LearningArtificial Intelligence
A
Avnish Narayan
NVIDIA
You Liang Tan
You Liang Tan
berkeley
G
Guanzhi Wang
NVIDIA, CalTech
J
Jing Wang
NVIDIA, NTU
Q
Qi Wang
NVIDIA
Yinzhen Xu
Yinzhen Xu
Peking University
Computer VisionRobotics
X
Xiaohui Zeng
NVIDIA
K
Kaiyuan Zheng
University of Washington
Ruijie Zheng
Ruijie Zheng
University of Maryland, College Park, NVIDIA
Machine LearningReinforcement Learning
M
Ming-Yu Liu
NVIDIA
Luke Zettlemoyer
Luke Zettlemoyer
University of Washington; Meta
Natural Language ProcessingSemanticsMachine LearningArtificial Intelligence
Dieter Fox
Dieter Fox
University of Washington and AI2
RoboticsArtificial IntelligenceComputer Vision
Jan Kautz
Jan Kautz
Vice President of Research, NVIDIA Research
Computer VisionMachine LearningVisual Computing
Scott Reed
Scott Reed
Research Scientist, NVIDIA Research
Artificial IntelligenceMachine LearningDeep Learning
Yuke Zhu
Yuke Zhu
The University of Texas at Austin, NVIDIA Research
Robot LearningComputer VisionMachine LearningRoboticsArtificial Intelligence
L
Linxi Fan
NVIDIA