🤖 AI Summary
To address cumulative pose drift and scale ambiguity in 3D Gaussian Splatting SLAM for large-scale dynamic outdoor scenes, this paper proposes a LiDAR-visual fusion hierarchical collaborative Gaussian SLAM system. The method introduces two key innovations: (1) an explicit–implicit hierarchical representation mechanism that enables human-like chain-of-thought multimodal collaboration for mutual enhancement; and (2) a joint dynamic modeling module integrating open-world semantic segmentation with DINO-Depth–driven uncertainty-aware implicit residual constraints to generate fine-grained dynamic masks. Evaluated on KITTI, nuScenes, and a custom dataset, the approach achieves state-of-the-art performance—significantly suppressing scale drift while improving dynamic object removal accuracy and reconstruction robustness.
📝 Abstract
3D Gaussian Splatting SLAM has emerged as a widely used technique for high-fidelity mapping in spatial intelligence. However, existing methods often rely on a single representation scheme, which limits their performance in large-scale dynamic outdoor scenes and leads to cumulative pose errors and scale ambiguity. To address these challenges, we propose extbf{LVD-GS}, a novel LiDAR-Visual 3D Gaussian Splatting SLAM system. Motivated by the human chain-of-thought process for information seeking, we introduce a hierarchical collaborative representation module that facilitates mutual reinforcement for mapping optimization, effectively mitigating scale drift and enhancing reconstruction robustness. Furthermore, to effectively eliminate the influence of dynamic objects, we propose a joint dynamic modeling module that generates fine-grained dynamic masks by fusing open-world segmentation with implicit residual constraints, guided by uncertainty estimates from DINO-Depth features. Extensive evaluations on KITTI, nuScenes, and self-collected datasets demonstrate that our approach achieves state-of-the-art performance compared to existing methods.