Procedural Scene Programs for Open-Universe Scene Generation: LLM-Free Error Correction via Program Search

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses open-vocabulary text-to-3D scene layout generation. We propose an instruction-based procedural generation paradigm that avoids large language models (LLMs): layouts are constructed progressively via iterative object placement and pose inference grounded in geometric relations to previously placed objects. A lightweight, differentiable program search mechanism is introduced to ensure layout validity while preserving high fidelity to the input text description. Furthermore, we design human-perception-aligned automated evaluation metrics. User studies demonstrate that our method significantly outperforms two declarative baselines in layout preference—82% and 94% of participants preferred our outputs, respectively—and our metrics exhibit strong agreement with human judgments (Spearman’s ρ > 0.92).

Technology Category

Application Category

📝 Abstract

Synthesizing 3D scenes from open-vocabulary text descriptions is a challenging, important, and recently-popular application. One of its critical subproblems is layout generation: given a set of objects, lay them out to produce a scene matching the input description. Nearly all recent work adopts a declarative paradigm for this problem: using an LLM to generate a specification of constraints between objects, then solving those constraints to produce the final layout. In contrast, we explore an alternative imperative paradigm, in which an LLM iteratively places objects, with each object's position and orientation computed as a function of previously-placed objects. The imperative approach allows for a simpler scene specification language while also handling a wider variety and larger complexity of scenes. We further improve the robustness of our imperative scheme by developing an error correction mechanism that iteratively improves the scene's validity while staying as close as possible to the original layout generated by the LLM. In forced-choice perceptual studies, participants preferred layouts generated by our imperative approach 82% and 94% of the time when compared against two declarative layout generation methods. We also present a simple, automated evaluation metric for 3D scene layout generation that aligns well with human preferences.

Problem

Research questions and friction points this paper is trying to address.

Generating 3D scenes from open-vocabulary text descriptions

Exploring imperative paradigm for layout generation over declarative methods

Developing error correction to improve scene validity and robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses imperative paradigm for object placement

Implements error correction via iterative program search

Generates layouts preferred in perceptual studies

🔎 Similar Papers

SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements