Learning Object Placement Programs for Indoor Scene Synthesis with Iterative Self Training

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing autoregressive indoor scene synthesis methods suffer from incomplete modeling of object spatial distributions, leading to implausible scene layouts. This paper proposes a domain-specific language (DSL)-based procedural object placement framework that predicts plausible positions for new objects within partial scenes via self-generated DSL programs. Our contributions are threefold: (1) the first interpretable, indoor-scene-oriented DSL; (2) a self-guided program induction algorithm integrating iterative self-training with unsupervised program synthesis; and (3) a human-annotated, position-distribution evaluation protocol. Experiments demonstrate strong robustness under sparse data regimes, significantly improved alignment between predicted object spatial distributions and human priors, and generation quality on par with state-of-the-art autoregressive methods.

Technology Category

Application Category

📝 Abstract
Data driven and autoregressive indoor scene synthesis systems generate indoor scenes automatically by suggesting and then placing objects one at a time. Empirical observations show that current systems tend to produce incomplete next object location distributions. We introduce a system which addresses this problem. We design a Domain Specific Language (DSL) that specifies functional constraints. Programs from our language take as input a partial scene and object to place. Upon execution they predict possible object placements. We design a generative model which writes these programs automatically. Available 3D scene datasets do not contain programs to train on, so we build upon previous work in unsupervised program induction to introduce a new program bootstrapping algorithm. In order to quantify our empirical observations we introduce a new evaluation procedure which captures how well a system models per-object location distributions. We ask human annotators to label all the possible places an object can go in a scene and show that our system produces per-object location distributions more consistent with human annotators. Our system also generates indoor scenes of comparable quality to previous systems and while previous systems degrade in performance when training data is sparse, our system does not degrade to the same degree.
Problem

Research questions and friction points this paper is trying to address.

Incomplete next object location distributions in scene synthesis.
Lack of program data for training in 3D scene datasets.
Performance degradation in sparse training data scenarios.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain Specific Language for functional constraints
Generative model for automatic program writing
Program bootstrapping algorithm for unsupervised training
🔎 Similar Papers
No similar papers found.
A
Adrian Chang
Vision Systems Inc., USA and Brown University, USA
K
Kai Wang
Brown University, USA
Yuanbo Li
Yuanbo Li
Ph.D. Student, Brown University
Computer GraphicsArtificial Intelligence
M
M. Savva
Simon Fraser University, Canada
A
Angel X. Chang
Simon Fraser University, Canada
Daniel Ritchie
Daniel Ritchie
Brown University
Computer GraphicsArtificial Intelligence