S-INF: Towards Realistic Indoor Scene Synthesis via Scene Implicit Neural Field

๐Ÿ“… 2024-12-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing methods for 3D indoor scene generation often oversimplify scene structure and neglect spatial and stylistic correlations among objects, leading to distorted layouts, coarse geometry, and inconsistent appearance. To address this, we propose the Scene Implicit Neural Field (S-INF), the first framework to decouple multimodal scene relationships into two hierarchical levelsโ€”layout-level (spatial topology) and detail-level (geometry/appearance)โ€”and jointly model them within a unified implicit neural field. S-INF introduces layout-guided implicit representation learning and layout-aware feature projection, integrated with differentiable rendering for end-to-end optimization. Experiments on 3D-FRONT demonstrate that S-INF consistently outperforms state-of-the-art methods across key metrics: layout plausibility, geometric fidelity, and cross-object style consistency. Several quantitative and qualitative results establish new state-of-the-art performance.

Technology Category

Application Category

๐Ÿ“ Abstract
Learning-based methods have become increasingly popular in 3D indoor scene synthesis (ISS), showing superior performance over traditional optimization-based approaches. These learning-based methods typically model distributions on simple yet explicit scene representations using generative models. However, due to the oversimplified explicit representations that overlook detailed information and the lack of guidance from multimodal relationships within the scene, most learning-based methods struggle to generate indoor scenes with realistic object arrangements and styles. In this paper, we introduce a new method, Scene Implicit Neural Field (S-INF), for indoor scene synthesis, aiming to learn meaningful representations of multimodal relationships, to enhance the realism of indoor scene synthesis. S-INF assumes that the scene layout is often related to the object-detailed information. It disentangles the multimodal relationships into scene layout relationships and detailed object relationships, fusing them later through implicit neural fields (INFs). By learning specialized scene layout relationships and projecting them into S-INF, we achieve a realistic generation of scene layout. Additionally, S-INF captures dense and detailed object relationships through differentiable rendering, ensuring stylistic consistency across objects. Through extensive experiments on the benchmark 3D-FRONT dataset, we demonstrate that our method consistently achieves state-of-the-art performance under different types of ISS.
Problem

Research questions and friction points this paper is trying to address.

3D Indoor Scene Generation
Learning-based Methods
Realism and Naturalness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene Implicit Neural Fields
Differential Rendering
3D Indoor Scene Synthesis
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zixi Liang
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China
Guowei Xu
Guowei Xu
Tsinghua University
Language ModelsReinforcement Learning
H
Haifeng Wu
School of Computer Science and Engineering, University of Electronic Science and Technology of China
Y
Ye Huang
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China
W
Wen Li
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, School of Computer Science and Engineering, University of Electronic Science and Technology of China
Lixin Duan
Lixin Duan
Data Intelligence Group (DIG) @ UESTC
Transfer LearningDomain Adaptation