🤖 AI Summary
This work addresses dynamic-scene SLAM by proposing the first method to embed a 4D Gaussian radiance field into a real-time simultaneous localization and mapping framework. To overcome the limitation of conventional approaches—which assume static scenes and ignore dynamic objects—we jointly optimize camera pose estimation and spatio-temporal decomposition of the scene into static and dynamic components via a 4D Gaussian field. Specifically, we introduce motion-aware masks to guide incremental modeling of static and dynamic Gaussians; design an MLP-based learnable transformation field to model non-rigid motion of dynamic Gaussians; and formulate a differentiable Gaussian-rendering-based photometric flow reconstruction loss for motion supervision. Evaluated on real-world dynamic RGB-D sequences, our method significantly improves tracking robustness and novel-view synthesis quality. Both qualitative and quantitative results demonstrate consistent superiority over static-scene baselines.
📝 Abstract
Simultaneously localizing camera poses and constructing Gaussian radiance fields in dynamic scenes establish a crucial bridge between 2D images and the 4D real world. Instead of removing dynamic objects as distractors and reconstructing only static environments, this paper proposes an efficient architecture that incrementally tracks camera poses and establishes the 4D Gaussian radiance fields in unknown scenarios by using a sequence of RGB-D images. First, by generating motion masks, we obtain static and dynamic priors for each pixel. To eliminate the influence of static scenes and improve the efficiency on learning the motion of dynamic objects, we classify the Gaussian primitives into static and dynamic Gaussian sets, while the sparse control points along with an MLP is utilized to model the transformation fields of the dynamic Gaussians. To more accurately learn the motion of dynamic Gaussians, a novel 2D optical flow map reconstruction algorithm is designed to render optical flows of dynamic objects between neighbor images, which are further used to supervise the 4D Gaussian radiance fields along with traditional photometric and geometric constraints. In experiments, qualitative and quantitative evaluation results show that the proposed method achieves robust tracking and high-quality view synthesis performance in real-world environments.