SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling

📅 2024-05-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the fragmentation and redundancy issues in incremental 3D scene modeling under heterogeneous inputs (RGB, RGB-D, RGB-LiDAR, monocular, etc.) by proposing the first workflow-centric unified framework. Methodologically: (i) we introduce U²-MVD, the first pose- and calibration-free multi-view depth estimation model; (ii) we propose semantic-aware ScaleCov depth completion; and (iii) we design dual-purpose multi-resolution neural points (DM-NPs) with improved point rasterization (IPR) to enable surface-accessible color field representation. Contributions include: end-to-end support for both incremental reconstruction and SLAM; seamless integration of dense RGB-D and monocular paradigms; state-of-the-art performance across depth estimation and surface reconstruction benchmarks; and a modular architecture enabling cross-task reusability and independent module extension.

Technology Category

Application Category

📝 Abstract
We present SceneFactory, a workflow-centric and unified framework for incremental scene modeling, that conveniently supports a wide range of applications, such as (unposed and/or uncalibrated) multi-view depth estimation, LiDAR completion, (dense) RGB-D/RGB-L/Mono/Depth-only reconstruction and SLAM. The workflow-centric design uses multiple blocks as the basis for constructing different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is placed on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks that form SceneFactory: (1) tracking, (2) flexion, (3) depth estimation, and (4) scene reconstruction. The tracking block is based on Mono SLAM and is extended to support RGB-D and RGB-LiDAR (RGB-L) inputs. Flexion is used to convert the depth image (untrackable) into a trackable image. For general-purpose depth estimation, we propose an unposed &uncalibrated multi-view depth estimation model (U$^2$-MVD) to estimate dense geometry. U$^2$-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multi-view depth. Relying on U$^2$-MVD, SceneFactory both supports user-friendly 3D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose Dual-purpose Multi-resolutional Neural Points (DM-NPs) for the first surface accessible Surface Color Field design, where we introduce Improved Point Rasterization (IPR) for point cloud based surface query. ...
Problem

Research questions and friction points this paper is trying to address.

Develops a unified framework for incremental scene modeling
Supports multi-view depth estimation and various reconstruction applications
Introduces workflow-centric design with modular blocks for flexibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Workflow-centric design with modular blocks
Unposed uncalibrated multi-view depth estimation
Dual-purpose neural points for surface reconstruction
🔎 Similar Papers
No similar papers found.