Controllable 3D Outdoor Scene Generation via Scene Graphs

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing 3D outdoor scene generation methods suffer from weak user controllability and rely on non-intuitive, imprecise conditioning inputs. This paper introduces the first scene-graph-based controllable framework for 3D urban scene generation: it conditions a diffusion model on compact, semantic scene graphs—leveraging scene graph encoding and a novel sparse graph-to-dense bird’s-eye view (BEV) embedding mapping—to synthesize high-fidelity, large-scale 3D semantic scenes. Key contributions include: (1) the first explicit integration of scene graphs as structured, semantic control signals for 3D outdoor generation; (2) the construction of the first large-scale paired dataset linking scene graphs with corresponding 3D semantic scenes; and (3) an interactive BEV embedding paradigm enabling fine-grained layout constraints. Quantitative and qualitative evaluations demonstrate substantial improvements over state-of-the-art methods in semantic consistency, geometric fidelity, and user controllability.

Technology Category

Application Category

📝 Abstract

Three-dimensional scene generation is crucial in computer vision, with applications spanning autonomous driving, gaming and the metaverse. Current methods either lack user control or rely on imprecise, non-intuitive conditions. In this work, we propose a method that uses, scene graphs, an accessible, user friendly control format to generate outdoor 3D scenes. We develop an interactive system that transforms a sparse scene graph into a dense BEV (Bird's Eye View) Embedding Map, which guides a conditional diffusion model to generate 3D scenes that match the scene graph description. During inference, users can easily create or modify scene graphs to generate large-scale outdoor scenes. We create a large-scale dataset with paired scene graphs and 3D semantic scenes to train the BEV embedding and diffusion models. Experimental results show that our approach consistently produces high-quality 3D urban scenes closely aligned with the input scene graphs. To the best of our knowledge, this is the first approach to generate 3D outdoor scenes conditioned on scene graphs.

Problem

Research questions and friction points this paper is trying to address.

Generates 3D outdoor scenes using user-friendly scene graphs.

Transforms sparse scene graphs into dense BEV embedding maps.

Produces high-quality 3D urban scenes aligned with input graphs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses scene graphs for user-friendly 3D scene control.

Transforms sparse graphs into dense BEV embedding maps.

Employs conditional diffusion models for scene generation.

🔎 Similar Papers

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion