GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing 3D Gaussian Splatting-based visual SLAM methods suffer significant performance degradation in dynamic environments due to their reliance on the static-scene assumption. To address this limitation, this work proposes GGD-SLAM, a framework that achieves robust camera localization and dense mapping without requiring semantic annotations or depth inputs. The method leverages a FIFO queue and a sequential attention mechanism to extract dynamic features, employs a dynamic feature enhancer to disentangle static and dynamic scene components, and introduces an occlusion-aware inpainting strategy alongside an interference-resistant adaptive SSIM loss function. Evaluated on real-world dynamic datasets, GGD-SLAM demonstrates state-of-the-art performance in both camera pose estimation and dense reconstruction.

Technology Category

Application Category

📝 Abstract

Visual SLAM algorithms achieve significant improvements through the exploration of 3D Gaussian Splatting (3DGS) representations, particularly in generating high-fidelity dense maps. However, they depend on a static environment assumption and experience significant performance degradation in dynamic environments. This paper presents GGD-SLAM, a framework that employs a generalizable motion model to address the challenges of localization and dense mapping in dynamic environments - without predefined semantic annotations or depth input. Specifically, the proposed system employs a First-In-First-Out (FIFO) queue to manage incoming frames, facilitating dynamic semantic feature extraction through a sequential attention mechanism. This is integrated with a dynamic feature enhancer to separate static and dynamic components. Additionally, to minimize dynamic distractors' impact on the static components, we devise a method to fill occluded areas via static information sampling and design a distractor-adaptive Structure Similarity Index Measure (SSIM) loss tailored for dynamic environments, significantly enhancing the system's resilience. Experiments conducted on real-world dynamic datasets demonstrate that the proposed system achieves state-of-the-art performance in camera pose estimation and dense reconstruction in dynamic scenes.

Problem

Research questions and friction points this paper is trying to address.

Visual SLAM

Dynamic Environments

3D Gaussian Splatting

Monocular

Dense Mapping

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

dynamic SLAM

generalizable motion model