Room Envelopes: A Synthetic Dataset for Indoor Layout Reconstruction from Images

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing monocular scene reconstruction methods recover only visible surfaces, failing to infer occluded indoor structural elements (e.g., walls, floors, ceilings). While generative approaches have advanced object completion, efficient modeling of regular, planar indoor layouts remains underexplored. To address this, we introduce Room Envelopes—the first synthetic dataset explicitly designed for indoor layout recovery—providing, for each scene, a single RGB image, a point cloud of visible surfaces, and a point cloud of the underlying structural surfaces after furniture removal (“room envelope”), enabling direct supervision of complete geometric layout. Leveraging this dataset, we train a feed-forward monocular model that jointly predicts both visible surfaces and structural layout. Experiments demonstrate substantial improvements in estimating room spatial extent and understanding inter-object spatial relationships. Our approach establishes a new paradigm for low-cost, high-fidelity indoor scene completion.

Technology Category

Application Category

📝 Abstract

Modern scene reconstruction methods are able to accurately recover 3D surfaces that are visible in one or more images. However, this leads to incomplete reconstructions, missing all occluded surfaces. While much progress has been made on reconstructing entire objects given partial observations using generative models, the structural elements of a scene, like the walls, floors and ceilings, have received less attention. We argue that these scene elements should be relatively easy to predict, since they are typically planar, repetitive and simple, and so less costly approaches may be suitable. In this work, we present a synthetic dataset -- Room Envelopes -- that facilitates progress on this task by providing a set of RGB images and two associated pointmaps for each image: one capturing the visible surface and one capturing the first surface once fittings and fixtures are removed, that is, the structural layout. As we show, this enables direct supervision for feed-forward monocular geometry estimators that predict both the first visible surface and the first layout surface. This confers an understanding of the scene's extent, as well as the shape and location of its objects.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing occluded structural elements from images

Creating synthetic dataset for indoor layout prediction

Enabling monocular geometry estimation of hidden surfaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic dataset for indoor layout reconstruction

RGB images with visible and layout surface pointmaps

Direct supervision for monocular geometry estimators

🔎 Similar Papers

No similar papers found.