VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D Gaussian Splatting (3DGS)-based SLAM methods suffer from memory explosion in large-scale or long-sequence scenarios, limiting their applicability to massive indoor–outdoor environments. This paper proposes the first 3DGS-SLAM framework designed explicitly for large-scale scenes. Our method addresses key challenges through three core innovations: (1) voxelized progressive multi-submap Gaussian mapping to curb memory growth; (2) tightly coupled 2D–3D fusion camera tracking for enhanced pose robustness; and (3) loop closure detection leveraging joint feature points and Gaussian ellipsoids, combined with online knowledge distillation–driven submap fusion to ensure global consistency. Integrated with voxel hash indexing, RGB-D multimodal tracking, and joint optimization, our approach achieves state-of-the-art performance across multiple large-scale indoor–outdoor benchmarks—supporting arbitrary scene scales while maintaining real-time operation, low drift, and high-fidelity reconstruction.

Technology Category

Application Category

📝 Abstract
3D Gaussian Splatting has recently shown promising results in dense visual SLAM. However, existing 3DGS-based SLAM methods are all constrained to small-room scenarios and struggle with memory explosion in large-scale scenes and long sequences. To this end, we propose VPGS-SLAM, the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios. We design a novel voxel-based progressive 3D Gaussian mapping method with multiple submaps for compact and accurate scene representation in large-scale and long-sequence scenes. This allows us to scale up to arbitrary scenes and improves robustness (even under pose drifts). In addition, we propose a 2D-3D fusion camera tracking method to achieve robust and accurate camera tracking in both indoor and outdoor large-scale scenes. Furthermore, we design a 2D-3D Gaussian loop closure method to eliminate pose drift. We further propose a submap fusion method with online distillation to achieve global consistency in large-scale scenes when detecting a loop. Experiments on various indoor and outdoor datasets demonstrate the superiority and generalizability of the proposed framework. The code will be open source on https://github.com/dtc111111/vpgs-slam.
Problem

Research questions and friction points this paper is trying to address.

Addresses memory explosion in large-scale 3D Gaussian SLAM
Enables robust indoor/outdoor mapping via voxel-based progressive submaps
Reduces pose drift with 2D-3D fusion tracking and loop closure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Voxel-based progressive 3D Gaussian mapping
2D-3D fusion camera tracking method
Submap fusion with online distillation
Tianchen Deng
Tianchen Deng
Shanghai Jiao Tong University
RoboticsComputer Vision
Wenhua Wu
Wenhua Wu
Shanghai Jiao Tong University
computer vision
Junjie He
Junjie He
Guizhou University
MRIDeep LearningCT
Y
Yue Pan
University of Bonn
X
Xirui Jiang
Shanghai Jiao Tong University
S
Shenghai Yuan
Nanyang Technological University
Danwei Wang
Danwei Wang
Professor, Nanyang Technological University
RoboticsControl EngineeringFault Diagnosis
H
Hesheng Wang
Shanghai Jiao Tong University
W
Weidong Chen
Shanghai Jiao Tong University