Toward Scene Graph and Layout Guided Complex 3D Scene Generation

📅 2024-12-29

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

To address challenges in text-to-3D complex scene generation—including difficulty modeling multi-object interactions, poor layout coherence, and cross-object appearance leakage—this paper proposes GraLa3D, a novel graph-based framework. GraLa3D introduces a structured scene graph comprising individual object nodes and composite hyper-nodes to explicitly encode spatial and semantic relationships among objects. It integrates layout-aware bounding box constraints into the graph structure and synergistically combines LLM-driven scene understanding, graph encoding, and layout-aware 3D diffusion optimization. Crucially, it departs from conventional score-distillation sampling (SDS) paradigms to enable multi-object co-manipulation and relation-guided generation. Experiments demonstrate that GraLa3D achieves state-of-the-art performance in text alignment, structural plausibility, and fine-grained appearance control, significantly improving the fidelity and compositional quality of multi-object 3D scenes.

Technology Category

Application Category

📝 Abstract

Recent advancements in object-centric text-to-3D generation have shown impressive results. However, generating complex 3D scenes remains an open challenge due to the intricate relations between objects. Moreover, existing methods are largely based on score distillation sampling (SDS), which constrains the ability to manipulate multiobjects with specific interactions. Addressing these critical yet underexplored issues, we present a novel framework of Scene Graph and Layout Guided 3D Scene Generation (GraLa3D). Given a text prompt describing a complex 3D scene, GraLa3D utilizes LLM to model the scene using a scene graph representation with layout bounding box information. GraLa3D uniquely constructs the scene graph with single-object nodes and composite super-nodes. In addition to constraining 3D generation within the desirable layout, a major contribution lies in the modeling of interactions between objects in a super-node, while alleviating appearance leakage across objects within such nodes. Our experiments confirm that GraLa3D overcomes the above limitations and generates complex 3D scenes closely aligned with text prompts.

Problem

Research questions and friction points this paper is trying to address.

3D scene generation

complex object interactions

text-to-3D

Innovation

Methods, ideas, or system contributions that make the work stand out.

GraLa3D

Multi-object Interaction

Complex 3D Scene Generation

🔎 Similar Papers

LT3SD: Latent Trees for 3D Scene Diffusion