Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

📅 2024-05-06
🏛️ arXiv.org
📈 Citations: 63
Influential: 0
📄 PDF
🤖 AI Summary
Does Sora possess general-purpose world modeling capabilities? This paper systematically evaluates large models’ world modeling abilities across three critical domains: video generation, autonomous driving, and agentic AI. It analyzes physical understanding, simulation mechanisms, and deployment bottlenecks. Methodologically, we introduce the first unified evaluation framework integrating generative modeling, physics-guided modeling, multimodal representation learning, causal reasoning, and reinforcement learning. We propose a novel taxonomy and evolutionary roadmap for AGI-oriented world models, and distill five quantifiable core capability dimensions—identifying key limitations in fidelity, causality, scalability, embodiment, and real-time adaptation. Furthermore, we release an open-source, dynamically updated review platform—the first comprehensive benchmark resource dedicated to world modeling research.

Technology Category

Application Category

📝 Abstract
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical laws. In this survey, we embark on a comprehensive exploration of the latest advancements in world models. Our analysis navigates through the forefront of generative methodologies in video generation, where world models stand as pivotal constructs facilitating the synthesis of highly realistic visual content. Additionally, we scrutinize the burgeoning field of autonomous-driving world models, meticulously delineating their indispensable role in reshaping transportation and urban mobility. Furthermore, we delve into the intricacies inherent in world models deployed within autonomous agents, shedding light on their profound significance in enabling intelligent interactions within dynamic environmental contexts. At last, we examine challenges and limitations of world models, and discuss their potential future directions. We hope this survey can serve as a foundational reference for the research community and inspire continued innovation. This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey.
Problem

Research questions and friction points this paper is trying to address.

Surveying advancements in general world models for AGI development
Analyzing Sora's simulation capabilities and physical law comprehension
Exploring world models in video generation and autonomous systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveying latest advancements in world models
Analyzing generative video synthesis methodologies
Exploring autonomous driving model applications
🔎 Similar Papers
No similar papers found.
Z
Zheng Zhu
GigaAI, Beijing, China
X
Xiaofeng Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Wangbo Zhao
Wangbo Zhao
National University of Singapore
Efficient Deep LearningDynamic Neural NetworkMultimodal Model
C
Chen Min
Institute of Computing Technology, Beijing, China
Nianchen Deng
Nianchen Deng
Shanghai AI Laboratory
CGARVR
Min Dou
Min Dou
Shanghai AI Laboratory
Autonomous DrivingMLLMEmbodied AI
Y
Yuqi Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Botian Shi
Botian Shi
Shanghai Artificial Intelligence Laboratory
VLMsDocument UnderstandingAutonomous Driving
K
Kai Wang
National University of Singapore, Singapore
C
Chi Zhang
Mach Drive, Beijing, China
Yang You
Yang You
Postdoc, Stanford University
3D visioncomputer graphicscomputational geometry
Zhaoxiang Zhang
Zhaoxiang Zhang
Institute of Automation, Chinese Academy of Sciences
Computer VisionPattern RecognitionBiologically-inspired Learning
D
Dawei Zhao
Defense Innovation Institute, Beijing, China
L
Liang Xiao
Defense Innovation Institute, Beijing, China
J
Jian Zhao
EVOL Lab, Institute of AI, China Telecom, and Northwestern Polytechnical University
J
Jiwen Lu
Tsinghua University, Beijing, China
G
Guan Huang
GigaAI, Beijing, China