RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural implicit representations for large-scale online RGB-D reconstruction suffer from insufficient geometric detail, prolonged training time, and poor convergence in pose optimization. To address these issues, we propose a residual hybrid representation: a TSDF grid serves as the explicit geometric foundation, while a lightweight neural residual module captures high-frequency geometric details. We introduce a local moving volumetric partitioning scheme coupled with a divide-and-conquer online learning mechanism to enable efficient incremental updates. Instead of optimizing absolute camera poses, we jointly optimize inter-frame pose deltas and incorporate an adaptive gradient amplification strategy to accelerate convergence and improve global consistency. Experiments demonstrate that our method significantly outperforms current state-of-the-art approaches on large-scale scenes, achieving a superior trade-off among reconstruction accuracy, geometric fidelity, and real-time performance.

Technology Category

Application Category

📝 Abstract
The introduction of the neural implicit representation has notably propelled the advancement of online dense reconstruction techniques. Compared to traditional explicit representations, such as TSDF, it improves the mapping completeness and memory efficiency. However, the lack of reconstruction details and the time-consuming learning of neural representations hinder the widespread application of neural-based methods to large-scale online reconstruction. We introduce RemixFusion, a novel residual-based mixed representation for scene reconstruction and camera pose estimation dedicated to high-quality and large-scale online RGB-D reconstruction. In particular, we propose a residual-based map representation comprised of an explicit coarse TSDF grid and an implicit neural module that produces residuals representing fine-grained details to be added to the coarse grid. Such mixed representation allows for detail-rich reconstruction with bounded time and memory budget, contrasting with the overly-smoothed results by the purely implicit representations, thus paving the way for high-quality camera tracking. Furthermore, we extend the residual-based representation to handle multi-frame joint pose optimization via bundle adjustment (BA). In contrast to the existing methods, which optimize poses directly, we opt to optimize pose changes. Combined with a novel technique for adaptive gradient amplification, our method attains better optimization convergence and global optimality. Furthermore, we adopt a local moving volume to factorize the mixed scene representation with a divide-and-conquer design to facilitate efficient online learning in our residual-based framework. Extensive experiments demonstrate that our method surpasses all state-of-the-art ones, including those based either on explicit or implicit representations, in terms of the accuracy of both mapping and tracking on large-scale scenes.
Problem

Research questions and friction points this paper is trying to address.

Improves detail and efficiency in large-scale RGB-D reconstruction
Combines explicit and implicit representations for richer scene details
Enhances camera pose estimation via residual-based optimization techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual-based mixed TSDF and neural representation
Adaptive gradient amplification for pose optimization
Local moving volume for efficient online learning
Yuqing Lan
Yuqing Lan
National University of Defense Technology
3D VisionComputer Graphics
C
Chenyang Zhu
National University of Defense Technology, China
Shuaifeng Zhi
Shuaifeng Zhi
Imperial College London
Jiazhao Zhang
Jiazhao Zhang
Peking University
Embodied AINavigation3D Vision
Z
Zhoufeng Wang
National University of Defense Technology, China
Renjiao Yi
Renjiao Yi
National University of Defense Technology
Computer Graphics3D Vision
Y
Yijie Wang
National University of Defense Technology, China
K
Kai Xu
National University of Defense Technology, China