🤖 AI Summary
Robotic policies often exhibit limited generalization to novel behaviors and unseen environments. Method: This paper proposes DreamGen, a four-stage framework that leverages a video world model to generate embodied-consistent synthetic neural trajectories (i.e., robot action videos), followed by latent-action modeling or inverse dynamics modeling to recover high-fidelity pseudo-action labels. It is the first work to adapt image-to-video generation models for embodied agents, establishing a “neural-trajectory-driven generalization” paradigm. Contribution/Results: We introduce DreamGen Bench—the first benchmark explicitly designed for generalization evaluation—and empirically demonstrate a strong positive correlation between video generation quality and downstream policy success rates. Using only teleoperated data from a single task in a single environment, DreamGen achieves zero-shot transfer of 22 novel behaviors across both seen and unseen environments, significantly improving cross-behavior and cross-environment generalization performance.
📝 Abstract
We introduce DreamGen, a simple yet highly effective 4-stage pipeline for training robot policies that generalize across behaviors and environments through neural trajectories - synthetic robot data generated from video world models. DreamGen leverages state-of-the-art image-to-video generative models, adapting them to the target robot embodiment to produce photorealistic synthetic videos of familiar or novel tasks in diverse environments. Since these models generate only videos, we recover pseudo-action sequences using either a latent action model or an inverse-dynamics model (IDM). Despite its simplicity, DreamGen unlocks strong behavior and environment generalization: a humanoid robot can perform 22 new behaviors in both seen and unseen environments, while requiring teleoperation data from only a single pick-and-place task in one environment. To evaluate the pipeline systematically, we introduce DreamGen Bench, a video generation benchmark that shows a strong correlation between benchmark performance and downstream policy success. Our work establishes a promising new axis for scaling robot learning well beyond manual data collection.