🤖 AI Summary
Existing approaches to 3D scene layout generation face three key limitations: (1) optimization-based methods rely heavily on hand-crafted rules; (2) generative models struggle to simultaneously ensure content diversity and accurate spatial relationship modeling; and (3) LLM-based methods lack robustness. To address these, this paper proposes a vision-guided image-to-3D-layout semantic parsing framework. We introduce a novel dataset comprising 2,037 3D assets and 147 high-quality layouts. Our method features an image parsing module that jointly extracts visual semantics and geometric priors, augmented by an explicit scene graph representation to enforce logical consistency in layout generation. Additionally, we leverage diffusion-based image generation models to enhance prompt expressivity. User studies demonstrate that our approach significantly outperforms state-of-the-art methods in layout richness, artistic quality, and spatial plausibility. The code and dataset are publicly released.
📝 Abstract
Generating artistic and coherent 3D scene layouts is crucial in digital content creation. Traditional optimization-based methods are often constrained by cumbersome manual rules, while deep generative models face challenges in producing content with richness and diversity. Furthermore, approaches that utilize large language models frequently lack robustness and fail to accurately capture complex spatial relationships. To address these challenges, this paper presents a novel vision-guided 3D layout generation system. We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts. Subsequently, we employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library. We then develop a robust image parsing module to recover the 3D layout of scenes based on visual semantics and geometric information. Finally, we optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images. Extensive user testing demonstrates that our algorithm significantly outperforms existing methods in terms of layout richness and quality. The code and dataset will be available at https://github.com/HiHiAllen/Imaginarium.