Deep Polycuboid Fitting for Compact 3D Representation of Indoor Scenes

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses indoor 3D scene reconstruction by proposing a novel method to automatically learn compact, structured, and semantically meaningful polycuboid (multi-cuboid) geometric representations directly from noisy point clouds, enabling downstream editing tasks such as furniture rearrangement. Methodologically, it introduces the first end-to-end deep learning framework: a Transformer-based module detects six types of cuboid faces; a graph neural network models spatial relationships among faces; and face label aggregation yields an editable set of axis-aligned bounding boxes. Key contributions include: (1) the first synthetic polycuboid dataset explicitly designed to encode indoor geometric priors; (2) significantly improved representation compactness and structural interpretability compared to voxel- or mesh-based alternatives; and (3) strong generalization across diverse real-world and synthetic benchmarks—including Replica, ScanNet, and iPhone-captured scenes—supporting applications such as virtual navigation and furniture-level scene editing.

Technology Category

Application Category

📝 Abstract

This paper presents a novel framework for compactly representing a 3D indoor scene using a set of polycuboids through a deep learning-based fitting method. Indoor scenes mainly consist of man-made objects, such as furniture, which often exhibit rectilinear geometry. This property allows indoor scenes to be represented using combinations of polycuboids, providing a compact representation that benefits downstream applications like furniture rearrangement. Our framework takes a noisy point cloud as input and first detects six types of cuboid faces using a transformer network. Then, a graph neural network is used to validate the spatial relationships of the detected faces to form potential polycuboids. Finally, each polycuboid instance is reconstructed by forming a set of boxes based on the aggregated face labels. To train our networks, we introduce a synthetic dataset encompassing a diverse range of cuboid and polycuboid shapes that reflect the characteristics of indoor scenes. Our framework generalizes well to real-world indoor scene datasets, including Replica, ScanNet, and scenes captured with an iPhone. The versatility of our method is demonstrated through practical applications, such as virtual room tours and scene editing.

Problem

Research questions and friction points this paper is trying to address.

Compact 3D representation of indoor scenes using polycuboids.

Deep learning-based fitting method for noisy point cloud input.

Applications in virtual room tours and scene editing.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning-based polycuboid fitting method

Transformer network detects cuboid faces

Graph neural network validates spatial relationships

🔎 Similar Papers

No similar papers found.