🤖 AI Summary
Accurately estimating camera poses while simultaneously reconstructing dynamic scenes in visual SLAM remains challenging. This work proposes a flow-guided dynamic 4D Gaussian splatting SLAM framework that leverages depth and optical flow to generate class-agnostic motion masks, effectively separating static and dynamic Gaussians. To accelerate optimization of dynamic Gaussians, the method introduces temporal centroid modeling within keyframes, 3D scene flow propagation, and an adaptive Gaussian insertion strategy. Furthermore, a Gaussian Mixture Model (GMM) is employed to jointly learn the temporal opacity and rotation of dynamic Gaussians, enabling robust representation of complex motion patterns. Experiments demonstrate that the proposed approach achieves state-of-the-art performance in terms of tracking accuracy, dynamic reconstruction quality, and training efficiency.
📝 Abstract
Handling the dynamic environments is a significant research challenge in Visual Simultaneous Localization and Mapping (SLAM). Recent research combines 3D Gaussian Splatting (3DGS) with SLAM to achieve both robust camera pose estimation and photorealistic renderings. However, using SLAM to efficiently reconstruct both static and dynamic regions remains challenging. In this work, we propose an efficient framework for dynamic 3DGS SLAM guided by optical flow. Using the input depth and prior optical flow, we first propose a category-agnostic motion mask generation strategy by fitting a camera ego-motion model to decompose the optical flow. This module separates dynamic and static Gaussians and simultaneously provides flow-guided camera pose initialization. We boost the training speed of dynamic 3DGS by explicitly modeling their temporal centers at keyframes. These centers are propagated using 3D scene flow priors and are dynamically initialized with an adaptive insertion strategy. Alongside this, we model the temporal opacity and rotation using a Gaussian Mixture Model (GMM) to adaptively learn the complex dynamics. The empirical results demonstrate our state-of-the-art performance in tracking, dynamic reconstruction, and training efficiency.