🤖 AI Summary
Existing 3D Gaussian Splatting (3DGS)-based SLAM methods suffer from memory explosion in large-scale or long-sequence scenarios, limiting their applicability to massive indoor–outdoor environments. This paper proposes the first 3DGS-SLAM framework designed explicitly for large-scale scenes. Our method addresses key challenges through three core innovations: (1) voxelized progressive multi-submap Gaussian mapping to curb memory growth; (2) tightly coupled 2D–3D fusion camera tracking for enhanced pose robustness; and (3) loop closure detection leveraging joint feature points and Gaussian ellipsoids, combined with online knowledge distillation–driven submap fusion to ensure global consistency. Integrated with voxel hash indexing, RGB-D multimodal tracking, and joint optimization, our approach achieves state-of-the-art performance across multiple large-scale indoor–outdoor benchmarks—supporting arbitrary scene scales while maintaining real-time operation, low drift, and high-fidelity reconstruction.
📝 Abstract
3D Gaussian Splatting has recently shown promising results in dense visual SLAM. However, existing 3DGS-based SLAM methods are all constrained to small-room scenarios and struggle with memory explosion in large-scale scenes and long sequences. To this end, we propose VPGS-SLAM, the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios. We design a novel voxel-based progressive 3D Gaussian mapping method with multiple submaps for compact and accurate scene representation in large-scale and long-sequence scenes. This allows us to scale up to arbitrary scenes and improves robustness (even under pose drifts). In addition, we propose a 2D-3D fusion camera tracking method to achieve robust and accurate camera tracking in both indoor and outdoor large-scale scenes. Furthermore, we design a 2D-3D Gaussian loop closure method to eliminate pose drift. We further propose a submap fusion method with online distillation to achieve global consistency in large-scale scenes when detecting a loop. Experiments on various indoor and outdoor datasets demonstrate the superiority and generalizability of the proposed framework. The code will be open source on https://github.com/dtc111111/vpgs-slam.