Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing NeRF- or 3D Gaussian Splatting (3DGS)-based SLAM methods struggle to simultaneously achieve real-time localization, mapping, and high-fidelity rendering in dynamic scenes—particularly under monocular RGB input. This paper introduces the first purely monocular RGB dynamic SLAM system built upon the 3DGS framework. Our method addresses key challenges via three core innovations: (1) a probabilistic dynamic mask generation mechanism integrating optical flow and depth estimation for robust motion region detection; (2) a motion-aware rendering loss explicitly modeling non-rigid motion at dynamic pixels; and (3) joint optimization of camera poses and Gaussian parameters within a single network iteration, drastically improving computational efficiency. Extensive experiments demonstrate state-of-the-art tracking accuracy and rendering quality in dynamic scenarios, matching or surpassing leading RGB-D dynamic SLAM approaches while operating solely on monocular video.

Technology Category

Application Category

📝 Abstract
Current Simultaneous Localization and Mapping (SLAM) methods based on Neural Radiance Fields (NeRF) or 3D Gaussian Splatting excel in reconstructing static 3D scenes but struggle with tracking and reconstruction in dynamic environments, such as real-world scenes with moving elements. Existing NeRF-based SLAM approaches addressing dynamic challenges typically rely on RGB-D inputs, with few methods accommodating pure RGB input. To overcome these limitations, we propose Dy3DGS-SLAM, the first 3D Gaussian Splatting (3DGS) SLAM method for dynamic scenes using monocular RGB input. To address dynamic interference, we fuse optical flow masks and depth masks through a probabilistic model to obtain a fused dynamic mask. With only a single network iteration, this can constrain tracking scales and refine rendered geometry. Based on the fused dynamic mask, we designed a novel motion loss to constrain the pose estimation network for tracking. In mapping, we use the rendering loss of dynamic pixels, color, and depth to eliminate transient interference and occlusion caused by dynamic objects. Experimental results demonstrate that Dy3DGS-SLAM achieves state-of-the-art tracking and rendering in dynamic environments, outperforming or matching existing RGB-D methods.
Problem

Research questions and friction points this paper is trying to address.

Monocular SLAM struggles with dynamic scene tracking
Existing methods rely on RGB-D, not pure RGB input
Dynamic objects cause interference in 3D reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular RGB input for dynamic 3DGS SLAM
Fused dynamic mask using optical flow and depth
Novel motion loss for pose estimation network
🔎 Similar Papers
No similar papers found.
Mingrui Li
Mingrui Li
Dalian University of Technology
SLAM3D VisionRobotics
Yiming Zhou
Yiming Zhou
Meta | UCLA
H
Hongxing Zhou
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
X
Xinggang Hu
School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, China
Florian Roemer
Florian Roemer
TU Ilmenau; Fraunhofer Institute for Nondestructive Testing IZFP
Compressed SensingAINDT
H
Hongyu Wang
School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, China
Ahmad Osman
Ahmad Osman
Professor for Sensor Technologies, Signal and Image Processing
Artificial IntelligenceDeep LearningData ReconstructionInspection Technologies