SceneCraft: Layout-Guided 3D Scene Generation

📅 2024-10-11

🏛️ Neural Information Processing Systems

📈 Citations: 8

✨ Influential: 1

career value

210K/year

🤖 AI Summary

Existing text-to-3D approaches struggle to generate large-scale, geometrically consistent indoor scenes that faithfully adhere to both user-provided textual descriptions and spatial layout preferences—particularly due to their reliance on single-room assumptions and limited control over shape and texture. This work introduces the first rendering-guided paradigm for converting semantic layouts into multi-view proxy images, enabling text- and layout-driven, end-to-end generation of apartment-scale, multi-room 3D scenes. Our method integrates 3D semantic layout rendering, semantic- and depth-conditioned diffusion modeling, and multi-view image-guided NeRF optimization. By preserving geometric consistency throughout the pipeline, it significantly enhances texture diversity and visual realism. Notably, it is the first method to support high-fidelity, end-to-end synthesis of irregularly structured, multi-bedroom apartments—overcoming longstanding limitations in controllability, scalability, and scene complexity.

Technology Category

Application Category

📝 Abstract

The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools. Although some pioneering methods have achieved automatic text-to-3D generation, they are generally limited to small-scale scenes with restricted control over the shape and texture. We introduce SceneCraft, a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences provided by users. Central to our method is a rendering-based technique, which converts 3D semantic layouts into multi-view 2D proxy maps. Furthermore, we design a semantic and depth conditioned diffusion model to generate multi-view images, which are used to learn a neural radiance field (NeRF) as the final scene representation. Without the constraints of panorama image generation, we surpass previous methods in supporting complicated indoor space generation beyond a single room, even as complicated as a whole multi-bedroom apartment with irregular shapes and layouts. Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality. Code and more results are available at: https://orangesodahub.github.io/SceneCraft

Problem

Research questions and friction points this paper is trying to address.

Automates complex 3D scene generation from user specifications

Overcomes limitations of small-scale text-to-3D methods

Enables multi-room layouts with realistic textures and geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rendering-based technique for 3D semantic layouts

Semantic and depth conditioned diffusion model

Neural radiance field (NeRF) for scene representation

🔎 Similar Papers

No similar papers found.