A Step Toward World Models: A Survey on Robotic Manipulation

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous robotic manipulation critically requires world models capable of understanding the physical mechanisms and dynamics of the environment; however, existing definitions remain ambiguous and capability boundaries ill-defined, hindering both generality and practical deployment. Method: This work adopts a functional perspective to systematically characterize the core capabilities of world models for robotic manipulation, proposing a task-driven, unified framework that integrates state representation learning, dynamic modeling, sequence prediction, closed-loop control, and model-based reinforcement learning. Contribution: We formally identify the essential components and functional roles of world models within the perception–prediction–control loop for the first time. Moving beyond conventional static representations, we emphasize dynamic modeling, causal reasoning, and online planning as critical capabilities. Furthermore, we provide a clear capability taxonomy and a systematic construction methodology, enabling the development of generalizable, scalable world models for robotics.

Technology Category

Application Category

📝 Abstract
Autonomous agents are increasingly expected to operate in complex, dynamic, and uncertain environments, performing tasks such as manipulation, navigation, and decision-making. Achieving these capabilities requires agents to understand the underlying mechanisms and dynamics of the world, moving beyond reactive control or simple replication of observed states. This motivates the development of world models as internal representations that encode environmental states, capture dynamics, and support prediction, planning, and reasoning. Despite growing interest, the definition, scope, architectures, and essential capabilities of world models remain ambiguous. In this survey, we go beyond prescribing a fixed definition and limiting our scope to methods explicitly labeled as world models. Instead, we examine approaches that exhibit the core capabilities of world models through a review of methods in robotic manipulation. We analyze their roles across perception, prediction, and control, identify key challenges and solutions, and distill the core components, capabilities, and functions that a fully realized world model should possess. Building on this analysis, we aim to motivate further development toward generalizable and practical world models for robotics.
Problem

Research questions and friction points this paper is trying to address.

Surveying robotic manipulation methods to define world model capabilities
Analyzing perception prediction and control roles of world models
Developing roadmap for generalizable practical world models in robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey examines robotic manipulation world models
Analyzes perception prediction control core capabilities
Outlines roadmap for generalizable practical world models
🔎 Similar Papers
No similar papers found.
Peng-Fei Zhang
Peng-Fei Zhang
University of Queensland
Y
Ying Cheng
School of Computer Science and Technology, Tongji University
Xiaofan Sun
Xiaofan Sun
School of Computer Science and Technology, Tongji University
S
Shijie Wang
School of Computer Science and Technology, Tongji University
Fengling Li
Fengling Li
University of Technology Sydney
Cross-modal AnalysisDomain AdaptationMultimodal Learning
L
Lei Zhu
School of Computer Science and Technology, Tongji University
H
Heng Tao Shen
School of Computer Science and Technology, Tongji University