🤖 AI Summary
Existing NeRF- or 3D Gaussian Splatting (3DGS)-based SLAM methods struggle to simultaneously achieve real-time localization, mapping, and high-fidelity rendering in dynamic scenes—particularly under monocular RGB input. This paper introduces the first purely monocular RGB dynamic SLAM system built upon the 3DGS framework. Our method addresses key challenges via three core innovations: (1) a probabilistic dynamic mask generation mechanism integrating optical flow and depth estimation for robust motion region detection; (2) a motion-aware rendering loss explicitly modeling non-rigid motion at dynamic pixels; and (3) joint optimization of camera poses and Gaussian parameters within a single network iteration, drastically improving computational efficiency. Extensive experiments demonstrate state-of-the-art tracking accuracy and rendering quality in dynamic scenarios, matching or surpassing leading RGB-D dynamic SLAM approaches while operating solely on monocular video.
📝 Abstract
Current Simultaneous Localization and Mapping (SLAM) methods based on Neural Radiance Fields (NeRF) or 3D Gaussian Splatting excel in reconstructing static 3D scenes but struggle with tracking and reconstruction in dynamic environments, such as real-world scenes with moving elements. Existing NeRF-based SLAM approaches addressing dynamic challenges typically rely on RGB-D inputs, with few methods accommodating pure RGB input. To overcome these limitations, we propose Dy3DGS-SLAM, the first 3D Gaussian Splatting (3DGS) SLAM method for dynamic scenes using monocular RGB input. To address dynamic interference, we fuse optical flow masks and depth masks through a probabilistic model to obtain a fused dynamic mask. With only a single network iteration, this can constrain tracking scales and refine rendered geometry. Based on the fused dynamic mask, we designed a novel motion loss to constrain the pose estimation network for tracking. In mapping, we use the rendering loss of dynamic pixels, color, and depth to eliminate transient interference and occlusion caused by dynamic objects. Experimental results demonstrate that Dy3DGS-SLAM achieves state-of-the-art tracking and rendering in dynamic environments, outperforming or matching existing RGB-D methods.