Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing NeRF- or 3D Gaussian Splatting (3DGS)-based SLAM methods struggle to simultaneously achieve real-time localization, mapping, and high-fidelity rendering in dynamic scenes—particularly under monocular RGB input. This paper introduces the first purely monocular RGB dynamic SLAM system built upon the 3DGS framework. Our method addresses key challenges via three core innovations: (1) a probabilistic dynamic mask generation mechanism integrating optical flow and depth estimation for robust motion region detection; (2) a motion-aware rendering loss explicitly modeling non-rigid motion at dynamic pixels; and (3) joint optimization of camera poses and Gaussian parameters within a single network iteration, drastically improving computational efficiency. Extensive experiments demonstrate state-of-the-art tracking accuracy and rendering quality in dynamic scenarios, matching or surpassing leading RGB-D dynamic SLAM approaches while operating solely on monocular video.

Technology Category

Application Category

📝 Abstract

Current Simultaneous Localization and Mapping (SLAM) methods based on Neural Radiance Fields (NeRF) or 3D Gaussian Splatting excel in reconstructing static 3D scenes but struggle with tracking and reconstruction in dynamic environments, such as real-world scenes with moving elements. Existing NeRF-based SLAM approaches addressing dynamic challenges typically rely on RGB-D inputs, with few methods accommodating pure RGB input. To overcome these limitations, we propose Dy3DGS-SLAM, the first 3D Gaussian Splatting (3DGS) SLAM method for dynamic scenes using monocular RGB input. To address dynamic interference, we fuse optical flow masks and depth masks through a probabilistic model to obtain a fused dynamic mask. With only a single network iteration, this can constrain tracking scales and refine rendered geometry. Based on the fused dynamic mask, we designed a novel motion loss to constrain the pose estimation network for tracking. In mapping, we use the rendering loss of dynamic pixels, color, and depth to eliminate transient interference and occlusion caused by dynamic objects. Experimental results demonstrate that Dy3DGS-SLAM achieves state-of-the-art tracking and rendering in dynamic environments, outperforming or matching existing RGB-D methods.

Problem

Research questions and friction points this paper is trying to address.

Monocular SLAM struggles with dynamic scene tracking

Existing methods rely on RGB-D, not pure RGB input

Dynamic objects cause interference in 3D reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular RGB input for dynamic 3DGS SLAM

Fused dynamic mask using optical flow and depth

Novel motion loss for pose estimation network

🔎 Similar Papers

MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting