🤖 AI Summary
Existing NeRF/3DGS-based visual SLAM methods predominantly rely on RGB-D sensors and are confined to indoor environments, limiting their applicability to large-scale outdoor real-time dense reconstruction. This work introduces the first stereo-camera-driven 3D Gaussian Splatting (3DGS) visual SLAM system tailored for outdoor scenarios, eliminating RGB-D dependency and overcoming indoor constraints. Key innovations include: (i) multimodal prior pose estimation; (ii) feature-alignment-guided Gaussian deformation regularization; (iii) continuous submap management; and (iv) submap-level loop closure optimization. The system enables end-to-end joint optimization—integrating stereo tracking, feature matching, Gaussian rendering, and incremental submap construction—while supporting structural refinement. Evaluated on EuRoC and KITTI benchmarks, our method consistently outperforms state-of-the-art neural SLAM, 3DGS-based SLAM, and classical SLAM approaches in localization accuracy, reconstruction quality, and robustness.
📝 Abstract
The recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encouraging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in large-scale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM. The proposed LSG-SLAM employs a multi-modality strategy to estimate prior poses under large view changes. In tracking, we introduce feature-alignment warping constraints to alleviate the adverse effects of appearance similarity in rendering losses. For the scalability of large-scale scenarios, we introduce continuous Gaussian Splatting submaps to tackle unbounded scenes with limited memory. Loops are detected between GS submaps by place recognition and the relative pose between looped keyframes is optimized utilizing rendering and feature warping losses. After the global optimization of camera poses and Gaussian points, a structure refinement module enhances the reconstruction quality. With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches. Project page: https://lsg-slam.github.io.