🤖 AI Summary
Existing neural implicit methods for online object-level 3D scene reconstruction from RGB-D video streams suffer from bottlenecks in real-time performance and shape completion capability. To address this, we propose an efficient online reconstruction framework featuring: (1) an updateable voxel grid feature interpolation mechanism for dynamic geometric refinement; (2) an object library with prior-guided initialization to robustly instantiate novel objects; and (3) synergistic optimization via object-centric modeling, cross-frame view synthesis, and shape prior transfer—jointly enhancing geometric completeness and geometric fidelity. Evaluated on Replica, ScanNet, and a custom RGB-D dataset, our method achieves significant improvements in reconstruction accuracy and completeness over state-of-the-art neural implicit approaches, while delivering superior real-time performance and shape completion quality.
📝 Abstract
This paper addresses the problem of reconstructing a scene online at the level of objects given an RGB-D video sequence. While current object-aware neural implicit representations hold promise, they are limited in online reconstruction efficiency and shape completion. Our main contributions to alleviate the above limitations are twofold. First, we propose a feature grid interpolation mechanism to continuously update grid-based object-centric neural implicit representations as new object parts are revealed. Second, we construct an object library with previously mapped objects in advance and leverage the corresponding shape priors to initialize geometric object models in new videos, subsequently completing them with novel views as well as synthesized past views to avoid losing original object details. Extensive experiments on synthetic environments from the Replica dataset, real-world ScanNet sequences and videos captured in our laboratory demonstrate that our approach outperforms state-of-the-art neural implicit models for this task in terms of reconstruction accuracy and completeness.