Large-Scale Gaussian Splatting SLAM

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing NeRF/3DGS-based visual SLAM methods predominantly rely on RGB-D sensors and are confined to indoor environments, limiting their applicability to large-scale outdoor real-time dense reconstruction. This work introduces the first stereo-camera-driven 3D Gaussian Splatting (3DGS) visual SLAM system tailored for outdoor scenarios, eliminating RGB-D dependency and overcoming indoor constraints. Key innovations include: (i) multimodal prior pose estimation; (ii) feature-alignment-guided Gaussian deformation regularization; (iii) continuous submap management; and (iv) submap-level loop closure optimization. The system enables end-to-end joint optimization—integrating stereo tracking, feature matching, Gaussian rendering, and incremental submap construction—while supporting structural refinement. Evaluated on EuRoC and KITTI benchmarks, our method consistently outperforms state-of-the-art neural SLAM, 3DGS-based SLAM, and classical SLAM approaches in localization accuracy, reconstruction quality, and robustness.

Technology Category

Application Category

📝 Abstract
The recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encouraging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in large-scale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM. The proposed LSG-SLAM employs a multi-modality strategy to estimate prior poses under large view changes. In tracking, we introduce feature-alignment warping constraints to alleviate the adverse effects of appearance similarity in rendering losses. For the scalability of large-scale scenarios, we introduce continuous Gaussian Splatting submaps to tackle unbounded scenes with limited memory. Loops are detected between GS submaps by place recognition and the relative pose between looped keyframes is optimized utilizing rendering and feature warping losses. After the global optimization of camera poses and Gaussian points, a structure refinement module enhances the reconstruction quality. With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches. Project page: https://lsg-slam.github.io.
Problem

Research questions and friction points this paper is trying to address.

Enables large-scale outdoor SLAM with stereo cameras
Improves robustness under large view changes
Manages memory for unbounded scenes efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses stereo cameras for large-scale 3DGS SLAM
Employs feature-alignment warping for robust tracking
Introduces continuous Gaussian Splatting submaps for scalability
🔎 Similar Papers
Z
Zhe Xin
Meituan UAV, Beijing, China
Chenyang Wu
Chenyang Wu
Ph.D. Candidate, LAMDA, Nanjing University
reinforcement learningartificial intelligence
P
Penghui Huang
Meituan UAV, Beijing, China
Yanyong Zhang
Yanyong Zhang
University of Science and Technology of China ; Rutgers University (Adjunct Visiting Professor)
SensingCyber-Physical SystemsMulti-Modal PerceptionEfficient AI Systems
Y
Yinian Mao
Meituan UAV, Beijing, China
G
Guoquan Huang
Dept. of Mechanical Engineering, Computer and Information Sciences, University of Delaware, Newark, DE, USA