🤖 AI Summary
This paper addresses the fragmentation and redundancy issues in incremental 3D scene modeling under heterogeneous inputs (RGB, RGB-D, RGB-LiDAR, monocular, etc.) by proposing the first workflow-centric unified framework. Methodologically: (i) we introduce U²-MVD, the first pose- and calibration-free multi-view depth estimation model; (ii) we propose semantic-aware ScaleCov depth completion; and (iii) we design dual-purpose multi-resolution neural points (DM-NPs) with improved point rasterization (IPR) to enable surface-accessible color field representation. Contributions include: end-to-end support for both incremental reconstruction and SLAM; seamless integration of dense RGB-D and monocular paradigms; state-of-the-art performance across depth estimation and surface reconstruction benchmarks; and a modular architecture enabling cross-task reusability and independent module extension.
📝 Abstract
We present SceneFactory, a workflow-centric and unified framework for incremental scene modeling, that conveniently supports a wide range of applications, such as (unposed and/or uncalibrated) multi-view depth estimation, LiDAR completion, (dense) RGB-D/RGB-L/Mono/Depth-only reconstruction and SLAM. The workflow-centric design uses multiple blocks as the basis for constructing different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is placed on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks that form SceneFactory: (1) tracking, (2) flexion, (3) depth estimation, and (4) scene reconstruction. The tracking block is based on Mono SLAM and is extended to support RGB-D and RGB-LiDAR (RGB-L) inputs. Flexion is used to convert the depth image (untrackable) into a trackable image. For general-purpose depth estimation, we propose an unposed &uncalibrated multi-view depth estimation model (U$^2$-MVD) to estimate dense geometry. U$^2$-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multi-view depth. Relying on U$^2$-MVD, SceneFactory both supports user-friendly 3D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose Dual-purpose Multi-resolutional Neural Points (DM-NPs) for the first surface accessible Surface Color Field design, where we introduce Improved Point Rasterization (IPR) for point cloud based surface query. ...