Dreamland: Controllable World Creation with Simulator and Generative Models

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Large-scale video generation models suffer from insufficient fine-grained controllability, limiting their applicability in scene editing and embodied agent training. To address this, we propose SimGen, a hybrid world-building framework that pioneers a synergistic architecture integrating physics simulators with large generative models, unified via hierarchical world abstractions bridging pixel-level and object-level semantic-geometric representations. Our key contributions are: (1) a generalizable hierarchical intermediate representation; (2) a plug-and-play mechanism for integrating pre-trained generative models; and (3) D3Sim—the first synthetic dataset designed specifically for evaluating hybrid generative systems. Experiments demonstrate a 50.8% improvement in image quality, a 17.9% gain in controllability, and substantial enhancement in embodied agent policy learning performance.

Technology Category

Application Category

📝 Abstract

Large-scale video generative models can synthesize diverse and realistic visual content for dynamic world creation, but they often lack element-wise controllability, hindering their use in editing scenes and training embodied AI agents. We propose Dreamland, a hybrid world generation framework combining the granular control of a physics-based simulator and the photorealistic content output of large-scale pretrained generative models. In particular, we design a layered world abstraction that encodes both pixel-level and object-level semantics and geometry as an intermediate representation to bridge the simulator and the generative model. This approach enhances controllability, minimizes adaptation cost through early alignment with real-world distributions, and supports off-the-shelf use of existing and future pretrained generative models. We further construct a D3Sim dataset to facilitate the training and evaluation of hybrid generation pipelines. Experiments demonstrate that Dreamland outperforms existing baselines with 50.8% improved image quality, 17.9% stronger controllability, and has great potential to enhance embodied agent training. Code and data will be made available.

Problem

Research questions and friction points this paper is trying to address.

Lack of element-wise controllability in large-scale video generative models

Need for hybrid framework combining simulator control and generative realism

Challenges in aligning pixel-level and object-level semantics for world creation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combining simulator and generative models

Layered world abstraction for enhanced controllability

D3Sim dataset for training hybrid generation pipelines

🔎 Similar Papers

Craftium: An Extensible Framework for Creating Reinforcement Learning Environments