From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Addressing the challenge of generating diverse and photorealistic object poses in real-world 3D indoor scenes (e.g., ScanNet) under data scarcity, this paper proposes a hierarchical disentangled scene generation framework that models scenes as two orthogonal factors: “room programs” and “object poses.” It constructs a reusable library of procedural functions and leverages large language models to generate structured, high-level programs—jointly modeling spatial semantics and layout logic. Subsequently, object poses are hierarchically predicted and retrieved conditioned on these programs. The method significantly improves visual realism and layout plausibility of generated scenes with limited real-world data. On the ScanNet benchmark, synthesized scenes are visually indistinguishable from authentic scans. To our knowledge, this is the first end-to-end 3D scene synthesis approach that is program-guided, semantically coherent, and capable of generating diverse, physically plausible object poses.

Technology Category

Application Category

📝 Abstract

Real-world scenes, such as those in ScanNet, are difficult to capture, with highly limited data available. Generating realistic scenes with varied object poses remains an open and challenging task. In this work, we propose FactoredScenes, a framework that synthesizes realistic 3D scenes by leveraging the underlying structure of rooms while learning the variation of object poses from lived-in scenes. We introduce a factored representation that decomposes scenes into hierarchically organized concepts of room programs and object poses. To encode structure, FactoredScenes learns a library of functions capturing reusable layout patterns from which scenes are drawn, then uses large language models to generate high-level programs, regularized by the learned library. To represent scene variations, FactoredScenes learns a program-conditioned model to hierarchically predict object poses, and retrieves and places 3D objects in a scene. We show that FactoredScenes generates realistic, real-world rooms that are difficult to distinguish from real ScanNet scenes.

Problem

Research questions and friction points this paper is trying to address.

Generating realistic 3D scenes with varied object poses

Learning reusable layout patterns from limited real-world data

Decomposing scenes into hierarchical programs and object poses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Factored representation hierarchically decomposes scenes

Learned library captures reusable layout patterns

Program-conditioned model hierarchically predicts object poses

🔎 Similar Papers

SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements

2024-08-05arXiv.orgCitations: 10