🤖 AI Summary
To address the challenge of balancing robustness and accuracy in real-time RGB-D SLAM dense reconstruction under severe camera motion (e.g., large viewpoint changes, rapid translation/rotation, or sudden jitter), this paper proposes a learning-optimization co-designed framework. Methodologically, it introduces a novel two-stage paradigm: “learning-driven initialization + geometry-guided stochastic optimization.” First, a lightweight CNN regresses metrically consistent relative poses to provide high-quality initialization for optimization. Second, a stochastic sampling optimization strategy—guided by depth-map geometric consistency—is devised to achieve robust and high-precision depth alignment. Evaluated on dynamic-motion datasets, the method significantly outperforms state-of-the-art approaches; on stable sequences, it matches their accuracy while maintaining real-time performance (>30 FPS). The core contribution lies in establishing a tightly coupled mechanism between learned priors and geometric optimization—enabling, for the first time, simultaneous real-time operation, robustness to extreme motion, and sub-centimeter reconstruction accuracy.
📝 Abstract
Real-time dense scene reconstruction during unstable camera motions is crucial for robotics, yet current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking. Classical optimization-based methods deliver high accuracy but fail with poor initialization during large motions, while learning-based approaches provide robustness but lack sufficient accuracy for dense reconstruction. We address this challenge through a combination of learning-based initialization with optimization-based refinement. Our method employs a camera pose regression network to predict metric-aware relative poses from consecutive RGB-D frames, which serve as reliable starting points for a randomized optimization algorithm that further aligns depth images with the scene geometry. Extensive experiments demonstrate promising results: our approach outperforms the best competitor on challenging benchmarks, while maintaining comparable accuracy on stable motion sequences. The system operates in real-time, showcasing that combining simple and principled techniques can achieve both robustness for unstable motions and accuracy for dense reconstruction. Project page: https://github.com/siyandong/PROFusion.