EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

📅 2024-10-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing language-guided 3D scene editing methods rely on manual intervention or support only appearance-level modifications, failing to enable fully automatic layout editing. To address this, we propose the first LLM-driven graph diffusion framework for end-to-end, natural-language-guided 3D room layout editing—supporting six structured operations: rotation, translation, scaling, replacement, addition, and deletion. To facilitate training and evaluation, we introduce EditRoom-DB, a large-scale language-layout editing dataset comprising 83K samples, and formulate a unified paradigm integrating semantic understanding and geometric generation. Our method synergizes LLM-based instruction planning, graph-structured diffusion modeling, and an automated 3D data augmentation pipeline. Extensive experiments demonstrate state-of-the-art performance in both layout editing accuracy and language-scene alignment.

Technology Category

Application Category

📝 Abstract

Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, and gaming. However, recent approaches to language-guided 3D scene editing either require manual interventions or focus only on appearance modifications without supporting comprehensive scene layout changes. In response, we propose EditRoom, a unified framework capable of executing a variety of layout edits through natural language commands, without requiring manual intervention. Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes using a diffusion-based method, enabling six types of edits: rotate, translate, scale, replace, add, and remove. To address the lack of data for language-guided 3D scene editing, we have developed an automatic pipeline to augment existing 3D scene synthesis datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs, for training and evaluation. Our experiments demonstrate that our approach consistently outperforms other baselines across all metrics, indicating higher accuracy and coherence in language-guided scene layout editing.

Problem

Research questions and friction points this paper is trying to address.

Enables language-guided 3D room layout editing without manual intervention

Supports six types of layout edits via natural language commands

Addresses lack of data with a large-scale dataset (EditRoom-DB)

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided 3D scene layout editing

Diffusion-based scene generation method

Automatic dataset augmentation pipeline

🔎 Similar Papers

Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints