WonderWorld: Interactive 3D Scene Generation from a Single Image

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 19
Influential: 2
📄 PDF
🤖 AI Summary
To address the challenges of slow inference, geometric incoherence, and limited controllability in single-image-to-interactive-3D-scene generation, this paper introduces Fast Layered Gaussian Surfels (FLAGS), a novel scene representation. FLAGS integrates geometry-guided single-view initialization with diffusion-model-driven part-conditioned depth estimation, enabling the first efficient, geometrically consistent, and controllable monocular 3D reconstruction. Our method generates complete 3D scenes in under 10 seconds on a single A6000 GPU, supporting real-time editing and neural rendering optimization. The reconstructed scenes exhibit cross-regional geometric consistency, semantic diversity, and structural connectivity—enabling robust scene composition and manipulation. All code and software are publicly released.

Technology Category

Application Category

📝 Abstract
We present WonderWorld, a novel framework for interactive 3D scene generation that enables users to interactively specify scene contents and layout and see the created scenes in low latency. The major challenge lies in achieving fast generation of 3D scenes. Existing scene generation approaches fall short of speed as they often require (1) progressively generating many views and depth maps, and (2) time-consuming optimization of the scene geometry representations. We introduce the Fast Layered Gaussian Surfels (FLAGS) as our scene representation and an algorithm to generate it from a single view. Our approach does not need multiple views, and it leverages a geometry-based initialization that significantly reduces optimization time. Another challenge is generating coherent geometry that allows all scenes to be connected. We introduce the guided depth diffusion that allows partial conditioning of depth estimation. WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. We demonstrate the potential of WonderWorld for user-driven content creation and exploration in virtual environments. We will release full code and software for reproducibility. Project website: https://kovenyu.com/WonderWorld/.
Problem

Research questions and friction points this paper is trying to address.

Fast interactive 3D scene generation from single image
Reducing optimization time for scene geometry representation
Ensuring coherent geometry for connected diverse 3D scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Fast Layered Gaussian Surfels (FLAGS) representation
Leverages geometry-based initialization for speed
Employs guided depth diffusion for coherent geometry