RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work introduces the first end-to-end, fully automatic text-to-3D indoor scene generation framework, capable of synthesizing geometrically plausible and semantically coherent room layouts from short natural language descriptions—without manual design, panoramic image guidance, or human intervention. Methodologically, it employs a hierarchical language-agent planner to parse spatial semantics and generate structured layout instructions, coupled with 3D Gaussian splatting initialization and explicit spatial constraint optimization for geometric modeling. To enable efficient rendering and interactive editing, it integrates AnyReach camera trajectory sampling and Interval Timestep Flow Sampling (ITFS). Experiments demonstrate that the framework generates high-fidelity scenes within 30 minutes, outperforming state-of-the-art methods in visual fidelity, layout合理性, and rendering speed, while supporting flexible post-generation editing.

Technology Category

Application Category

📝 Abstract
In this paper, we propose RoomPlanner, the first fully automatic 3D room generation framework for painlessly creating realistic indoor scenes with only short text as input. Without any manual layout design or panoramic image guidance, our framework can generate explicit layout criteria for rational spatial placement. We begin by introducing a hierarchical structure of language-driven agent planners that can automatically parse short and ambiguous prompts into detailed scene descriptions. These descriptions include raw spatial and semantic attributes for each object and the background, which are then used to initialize 3D point clouds. To position objects within bounded environments, we implement two arrangement constraints that iteratively optimize spatial arrangements, ensuring a collision-free and accessible layout solution. In the final rendering stage, we propose a novel AnyReach Sampling strategy for camera trajectory, along with the Interval Timestep Flow Sampling (ITFS) strategy, to efficiently optimize the coarse 3D Gaussian scene representation. These approaches help reduce the total generation time to under 30 minutes. Extensive experiments demonstrate that our method can produce geometrically rational 3D indoor scenes, surpassing prior approaches in both rendering speed and visual quality while preserving editability. The code will be available soon.
Problem

Research questions and friction points this paper is trying to address.

Automatically generating realistic 3D rooms from short text prompts
Creating collision-free object layouts without manual design guidance
Reducing generation time while maintaining rendering quality and editability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical agent planners parse prompts into scene descriptions
Two arrangement constraints optimize collision-free object layouts
AnyReach and ITFS sampling accelerate 3D Gaussian rendering
🔎 Similar Papers
No similar papers found.
W
Wenzhuo Sun
Monash University, Melbourne, Australia
M
Mingjian Liang
Monash University, Melbourne, Australia
Wenxuan Song
Wenxuan Song
The Hong Kong University of Science and Technology (Guangzhou)
Vision-language-action ModelRobotics
Xuelian Cheng
Xuelian Cheng
Monash University
3D VisionMedical ImagingMachine Learning
Z
Zongyuan Ge
Monash University, Melbourne, Australia