GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments

πŸ“… 2026-04-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

213K/year
πŸ€– AI Summary
Existing 3D Gaussian Splatting-based visual SLAM methods suffer significant performance degradation in dynamic environments due to their reliance on the static-scene assumption. To address this limitation, this work proposes GGD-SLAM, a framework that achieves robust camera localization and dense mapping without requiring semantic annotations or depth inputs. The method leverages a FIFO queue and a sequential attention mechanism to extract dynamic features, employs a dynamic feature enhancer to disentangle static and dynamic scene components, and introduces an occlusion-aware inpainting strategy alongside an interference-resistant adaptive SSIM loss function. Evaluated on real-world dynamic datasets, GGD-SLAM demonstrates state-of-the-art performance in both camera pose estimation and dense reconstruction.

Technology Category

Application Category

πŸ“ Abstract
Visual SLAM algorithms achieve significant improvements through the exploration of 3D Gaussian Splatting (3DGS) representations, particularly in generating high-fidelity dense maps. However, they depend on a static environment assumption and experience significant performance degradation in dynamic environments. This paper presents GGD-SLAM, a framework that employs a generalizable motion model to address the challenges of localization and dense mapping in dynamic environments - without predefined semantic annotations or depth input. Specifically, the proposed system employs a First-In-First-Out (FIFO) queue to manage incoming frames, facilitating dynamic semantic feature extraction through a sequential attention mechanism. This is integrated with a dynamic feature enhancer to separate static and dynamic components. Additionally, to minimize dynamic distractors' impact on the static components, we devise a method to fill occluded areas via static information sampling and design a distractor-adaptive Structure Similarity Index Measure (SSIM) loss tailored for dynamic environments, significantly enhancing the system's resilience. Experiments conducted on real-world dynamic datasets demonstrate that the proposed system achieves state-of-the-art performance in camera pose estimation and dense reconstruction in dynamic scenes.
Problem

Research questions and friction points this paper is trying to address.

Visual SLAM
Dynamic Environments
3D Gaussian Splatting
Monocular
Dense Mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
dynamic SLAM
generalizable motion model
sequential attention mechanism
distractor-adaptive SSIM
πŸ”Ž Similar Papers
No similar papers found.
Yi Liu
Yi Liu
ζΈ…εŽε€§ε­¦
ζœΊε™¨δΊΊθ§†θ§‰ SLAM
Haoxuan Xu
Haoxuan Xu
Beihang University
computer vision
H
Hongbo Duan
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
K
Keyu Fan
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Z
Zhengyang Zhang
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
P
Peiyu Zhuang
School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen, China
P
Pengting Luo
Central Media Technology Institute, Huawei Incorporated Company, Shenzhen, China
H
Houde Liu
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China