OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current 4D world modeling is hindered by the scarcity of high-quality, highly dynamic, and multi-domain data. Existing benchmarks suffer from limited spatiotemporal complexity, insufficient modality diversity, and inadequate support for key tasks—including 4D geometric reconstruction, future action prediction, and camera-controllable video generation. To address this, we introduce OmniWorld: the first large-scale, multi-domain, multimodal 4D world modeling dataset. It features a newly collected, interaction-rich, photorealistic sub-dataset—OmniWorld-Game—with fine-grained spatiotemporal annotations. Leveraging multi-source acquisition, cross-modal alignment, and collaborative fine-tuning with generative models, we establish a unified benchmark. Experiments demonstrate substantial improvements over state-of-the-art methods in both 4D reconstruction and video generation, while enabling rigorous cross-domain evaluation on high-stakes tasks. Our work validates the critical role of data-driven paradigms in advancing general-purpose 4D understanding.

Technology Category

Application Category

📝 Abstract
The field of 4D world modeling - aiming to jointly capture spatial geometry and temporal dynamics - has witnessed remarkable progress in recent years, driven by advances in large-scale generative models and multimodal learning. However, the development of truly general 4D world models remains fundamentally constrained by the availability of high-quality data. Existing datasets and benchmarks often lack the dynamic complexity, multi-domain diversity, and spatial-temporal annotations required to support key tasks such as 4D geometric reconstruction, future prediction, and camera-control video generation. To address this gap, we introduce OmniWorld, a large-scale, multi-domain, multi-modal dataset specifically designed for 4D world modeling. OmniWorld consists of a newly collected OmniWorld-Game dataset and several curated public datasets spanning diverse domains. Compared with existing synthetic datasets, OmniWorld-Game provides richer modality coverage, larger scale, and more realistic dynamic interactions. Based on this dataset, we establish a challenging benchmark that exposes the limitations of current state-of-the-art (SOTA) approaches in modeling complex 4D environments. Moreover, fine-tuning existing SOTA methods on OmniWorld leads to significant performance gains across 4D reconstruction and video generation tasks, strongly validating OmniWorld as a powerful resource for training and evaluation. We envision OmniWorld as a catalyst for accelerating the development of general-purpose 4D world models, ultimately advancing machines' holistic understanding of the physical world.
Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality data for 4D world modeling
Existing datasets lack dynamic complexity and diversity
Need for better spatial-temporal annotations and multi-modal coverage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multi-domain multi-modal dataset
Richer modality coverage realistic dynamics
Fine-tuning SOTA methods performance gains
🔎 Similar Papers
No similar papers found.