SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Current large language models often produce erroneous layouts and object collisions in 3D indoor scene generation due to inadequate spatial representations. To address this, this work proposes SpatialGrammar—a compilable, domain-specific language (DSL) tailored for 3D indoor scenes—that encodes spatial structure through a gravity-aligned top-down grid representation and deterministically compiles into collision-free 3D geometry. Leveraging this DSL, we develop SG-Agent, a closed-loop optimization system, and SG-Mini, a lightweight 104M-parameter model, which together enable efficient scene generation trained exclusively on synthetic data for the first time. Experiments demonstrate that SG-Agent substantially improves spatial fidelity and physical plausibility, while SG-Mini matches or exceeds the performance of significantly larger LLM baselines in single-pass generation.

📝 Abstract

Automatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial errors and collisions, in part because common scene representations-raw coordinates or verbose code-are difficult for models to reason about 3D spatial relationships and physical constraints. We propose SpatialGrammar, a domain-specific language that represents gravity-aligned indoor layouts as BEV grid placements with deterministic compilation to valid 3D geometry, enabling verifiable constraint checking. Building on this representation, we develop (1) SG-Agent, a closed-loop system that uses compiler feedback to iteratively refine scenes and enforce collision constraints, and (2) SG-Mini, a 104M-parameter model trained entirely on compiler-validated synthetic data. Across 159 test scenes spanning five scenarios of different complexity, SG-Agent improves spatial fidelity and physical plausibility over prior methods, while SG-Mini performs competitively against larger LLM-based baselines on single-shot generation scenarios.

Problem

Research questions and friction points this paper is trying to address.

3D indoor scene generation

spatial reasoning

collision avoidance

scene representation

physical plausibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

SpatialGrammar

domain-specific language

3D scene generation