A Modular Framework for Single-View 3D Reconstruction of Indoor Environments

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

In single-view 2D indoor scene reconstruction, severe depth ambiguity and complex instance occlusion lead to geometric distortion and incomplete layout estimation. To address these challenges, this paper proposes a two-stage decoupled framework: first performing modality-agnostic instance completion and layout refinement, then generating geometric structure. Key contributions include: (1) the first diffusion-based modular architecture for indoor scene reconstruction; (2) modality-agnostic instance completion tailored to indoor scenes; (3) layout-aware dedicated layout refinement; and (4) a hybrid depth estimation scheme coupled with 2D/3D joint view alignment. Evaluated on the 3D-Front dataset, our method significantly outperforms state-of-the-art approaches in both geometric accuracy (e.g., Chamfer distance, F-Score) and visual realism (e.g., LPIPS, FID). The resulting high-fidelity reconstructions support practical applications in interior design, real estate visualization, and augmented reality.

Technology Category

Application Category

📝 Abstract

We propose a modular framework for single-view indoor scene 3D reconstruction, where several core modules are powered by diffusion techniques. Traditional approaches for this task often struggle with the complex instance shapes and occlusions inherent in indoor environments. They frequently overshoot by attempting to predict 3D shapes directly from incomplete 2D images, which results in limited reconstruction quality. We aim to overcome this limitation by splitting the process into two steps: first, we employ diffusion-based techniques to predict the complete views of the room background and occluded indoor instances, then transform them into 3D. Our modular framework makes contributions to this field through the following components: an amodal completion module for restoring the full view of occluded instances, an inpainting model specifically trained to predict room layouts, a hybrid depth estimation technique that balances overall geometric accuracy with fine detail expressiveness, and a view-space alignment method that exploits both 2D and 3D cues to ensure precise placement of instances within the scene. This approach effectively reconstructs both foreground instances and the room background from a single image. Extensive experiments on the 3D-Front dataset demonstrate that our method outperforms current state-of-the-art (SOTA) approaches in terms of both visual quality and reconstruction accuracy. The framework holds promising potential for applications in interior design, real estate, and augmented reality.

Problem

Research questions and friction points this paper is trying to address.

Reconstructs 3D indoor scenes from single images using diffusion-based modules.

Overcomes occlusion and shape complexity in indoor environments.

Enhances reconstruction quality via amodal completion and layout prediction.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework splits reconstruction into two steps

Uses diffusion techniques for amodal completion and inpainting

Hybrid depth estimation balances geometry and detail accuracy

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View