🤖 AI Summary
This paper addresses key challenges in large-scale indoor digital twin construction—namely, object collisions, structural incoherence, and poor floorplan adaptability during 3D scene generation. To this end, we propose an end-to-end generative framework leveraging 2D layout images as an intermediate representation. Methodologically, we introduce the first approach that exploits 2D layout encoding to detect out-of-distribution (OOD) anomalies, thereby fundamentally mitigating collisions; integrate conditional diffusion models with scene graph decoders to produce geometrically collision-free, semantically structured, and whole-apartment-coherent layouts; and support both floorplan-driven and multimodal (text/sketch) joint control. Our contributions include: (1) the first OOD-aware, collision-robust generative mechanism; (2) a high-quality, large-coverage indoor layout dataset; and (3) state-of-the-art performance on 3D-FRONT and our proprietary dataset, yielding high-fidelity, spatially consistent, and editable apartment-level 3D scenes.
📝 Abstract
We introduce CHOrD, a novel framework for scalable synthesis of 3D indoor scenes, designed to create house-scale, collision-free, and hierarchically structured indoor digital twins. In contrast to existing methods that directly synthesize the scene layout as a scene graph or object list, CHOrD incorporates a 2D image-based intermediate layout representation, enabling effective prevention of collision artifacts by successfully capturing them as out-of-distribution (OOD) scenarios during generation. Furthermore, unlike existing methods, CHOrD is capable of generating scene layouts that adhere to complex floor plans with multi-modal controls, enabling the creation of coherent, house-wide layouts robust to both geometric and semantic variations in room structures. Additionally, we propose a novel dataset with expanded coverage of household items and room configurations, as well as significantly improved data quality. CHOrD demonstrates state-of-the-art performance on both the 3D-FRONT and our proposed datasets, delivering photorealistic, spatially coherent indoor scene synthesis adaptable to arbitrary floor plan variations.