Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

πŸ“… 2026-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

228K/year
πŸ€– AI Summary
This work addresses the limitations of existing text- or image-based methods for 3D indoor scene generation, which often struggle to accurately model spatial layouts and suffer from instability or cyclic behavior when using image-guided agents for holistic room synthesis. To overcome these challenges, the authors propose a multimodal large language model–based agent framework that reformulates 3D room generation as a Blender scripting task through a structured execution mechanism. The approach parses scene elements and their spatial relationships from a top-down view and generates geometry, material, and lighting code in sequential stages. A novel structured execution controller and cross-stage memory module are introduced to mitigate context forgetting. Furthermore, the study presents the first dedicated evaluation benchmark for code-based 3D room generation. Experiments demonstrate that the proposed method significantly outperforms existing agent-based approaches on this benchmark, achieving stable, controllable, and high-fidelity end-to-end generation.
πŸ“ Abstract
Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, text-based methods struggle to capture precise spatial information, and existing image-conditioned agents suffer from instability and infinite looping when tasked with holistic room generation from top-down views. To address these limitations, we propose Code-as-Room, an MLLM-based agentic framework equipped with a structured execution harness, which represents 3D rooms with Blender codes. Given a top-down room image, the framework parses the reference image to extract scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting in a principled, multi-stage pipeline. A cross-stage memory module is maintained throughout to mitigate context forgetting inherent to existing agent-based frameworks. We further introduce a dedicated benchmark for code-based 3D room synthesis, encompassing various evaluation protocols. Based on our benchmark, comprehensive comparisons against existing agent-based methods are conducted to validate the effectiveness of our proposed execution harness.
Problem

Research questions and friction points this paper is trying to address.

3D room synthesis
top-down view
spatial information
agent instability
code-based generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic code synthesis
3D room generation
Blender code representation
cross-stage memory
top-down view conditioning