Dreamland: Controllable World Creation with Simulator and Generative Models

πŸ“… 2025-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large-scale video generation models suffer from insufficient fine-grained controllability, limiting their applicability in scene editing and embodied agent training. To address this, we propose SimGen, a hybrid world-building framework that pioneers a synergistic architecture integrating physics simulators with large generative models, unified via hierarchical world abstractions bridging pixel-level and object-level semantic-geometric representations. Our key contributions are: (1) a generalizable hierarchical intermediate representation; (2) a plug-and-play mechanism for integrating pre-trained generative models; and (3) D3Simβ€”the first synthetic dataset designed specifically for evaluating hybrid generative systems. Experiments demonstrate a 50.8% improvement in image quality, a 17.9% gain in controllability, and substantial enhancement in embodied agent policy learning performance.

Technology Category

Application Category

πŸ“ Abstract
Large-scale video generative models can synthesize diverse and realistic visual content for dynamic world creation, but they often lack element-wise controllability, hindering their use in editing scenes and training embodied AI agents. We propose Dreamland, a hybrid world generation framework combining the granular control of a physics-based simulator and the photorealistic content output of large-scale pretrained generative models. In particular, we design a layered world abstraction that encodes both pixel-level and object-level semantics and geometry as an intermediate representation to bridge the simulator and the generative model. This approach enhances controllability, minimizes adaptation cost through early alignment with real-world distributions, and supports off-the-shelf use of existing and future pretrained generative models. We further construct a D3Sim dataset to facilitate the training and evaluation of hybrid generation pipelines. Experiments demonstrate that Dreamland outperforms existing baselines with 50.8% improved image quality, 17.9% stronger controllability, and has great potential to enhance embodied agent training. Code and data will be made available.
Problem

Research questions and friction points this paper is trying to address.

Lack of element-wise controllability in large-scale video generative models
Need for hybrid framework combining simulator control and generative realism
Challenges in aligning pixel-level and object-level semantics for world creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combining simulator and generative models
Layered world abstraction for enhanced controllability
D3Sim dataset for training hybrid generation pipelines
πŸ”Ž Similar Papers
No similar papers found.
Sicheng Mo
Sicheng Mo
University of California, Los Angeles
Computer Vision
Z
Ziyang Leng
University of California, Los Angeles
L
Leon Liu
University of California, Los Angeles
W
Weizhen Wang
University of California, Los Angeles
Honglin He
Honglin He
UCLA
Bolei Zhou
Bolei Zhou
Associate Professor at UCLA
Computer VisionRoboticsArtificial Intelligence