World Model for Robot Learning: A Comprehensive Survey

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
This work presents the first comprehensive survey of world models in robot learning, offering a systematic synthesis of their key paradigms, functional roles, and evolutionary trajectory. Addressing the current fragmentation across architectures and application domains, the study elucidates how world models interface with policy learning, serve as learnable simulators, and evolve from imagination-based generation toward structured control and foundation model integration. Through an integrative analysis spanning predictive modeling, reinforcement learning, video generation, navigation, and autonomous driving, the paper clarifies the core contributions of world models, catalogs representative datasets, benchmarks, and evaluation protocols, and establishes a continuously updated open-source repository to provide both a foundational reference and infrastructural support for future research in this rapidly advancing field.
📝 Abstract
World models, which are predictive representations of how environments evolve under actions, have become a central component of robot learning. They support policy learning, planning, simulation, evaluation, data generation, and have advanced rapidly with the rise of foundation models and large-scale video generation. However, the literature remains fragmented across architectures, functional roles, and embodied application domains. To address this gap, we present a comprehensive review of world models from a robot-learning perspective. We examine how world models are coupled with robot policies, how they serve as learned simulators for reinforcement learning and evaluation, and how robotic video world models have progressed from imagination-based generation to controllable, structured, and foundation-scale formulations. We further connect these ideas to navigation and autonomous driving, and summarize representative datasets, benchmarks, and evaluation protocols. Overall, this survey systematically reviews the rapidly growing literature on world models for robot learning, clarifies key paradigms and applications, and highlights major challenges and future directions for predictive modeling in embodied agents. To facilitate continued access to newly emerging works, benchmarks, and resources, we will maintain and regularly update the accompanying GitHub repository alongside this survey.
Problem

Research questions and friction points this paper is trying to address.

world models
robot learning
predictive modeling
embodied agents
literature fragmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

World Models
Robot Learning
Learned Simulators
Video Generation
Embodied Agents
Bohan Hou
Bohan Hou
PhD of Computer Science, Carnegie Mellon University
Machine LearningSystems
Gen Li
Gen Li
Postdoctoral Research Fellow, Nanyang Technological University
Embodied AIComputer VisionRoboticsArtificial Intelligence
J
Jindou Jia
Nanyang Technological University
T
Tuo An
Nanyang Technological University
X
Xinying Guo
Nanyang Technological University
Sicong Leng
Sicong Leng
Nanyang Technological University
Multi-modal Learning
Haoran Geng
Haoran Geng
PhD Student, UC Berkeley
RoboticsComputer VisionReinforcement Learning
Yanjie Ze
Yanjie Ze
Stanford University
RoboticsEmbodied AIHumanoid Robots
Tatsuya Harada
Tatsuya Harada
The University of Tokyo
Computer VisionMachine LearningIntelligent Robot
Philip Torr
Philip Torr
Professor, University of Oxford
Department of Engineering
Oier Mees
Oier Mees
Microsoft
RoboticsMachine LearningComputer VisionRobot Learning
Marc Pollefeys
Marc Pollefeys
Professor of Computer Science, ETH Zurich, and Director Spatial AI Lab, Microsoft
Computer VisionComputer GraphicsRoboticsMachine LearningAugmented Reality
Zhuang Liu
Zhuang Liu
Assistant Professor, Princeton University
Deep LearningComputer VisionMachine Learning
Jiajun Wu
Jiajun Wu
Stanford University
Computer VisionRoboticsArtificial IntelligenceMachine LearningCognitive Science
Pieter Abbeel
Pieter Abbeel
UC Berkeley | Covariant
RoboticsMachine LearningAI
J
Jitendra Malik
University of California, Berkeley
Yilun Du
Yilun Du
Harvard University
Artificial IntelligenceMachine LearningRoboticsComputer Vision
Jianfei Yang
Jianfei Yang
Assistant Professor, Director of MARS Lab, Nanyang Technological University
Physical AIEmbodied AIMultimodal AI