Dynamic Visual SLAM using a General 3D Prior

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dynamic objects in natural environments degrade monocular SLAM performance by inducing pose estimation drift and map reconstruction artifacts. To address this, we propose a robust monocular SLAM framework integrating feedforward deep reconstruction with geometric patch-based online bundle adjustment. Our approach introduces a lightweight feedforward network for real-time dynamic region segmentation and removal, while leveraging predicted depth to mitigate monocular scale ambiguity. Furthermore, we design a depth-geometric patch alignment mechanism that explicitly enforces static structural consistency during optimization. This framework preserves the simplicity of monocular systems while significantly suppressing dynamic object interference in both trajectory estimation and mapping. Experimental evaluation on multiple dynamic-scene datasets demonstrates average pose accuracy improvements of 23%–41% over state-of-the-art methods, alongside superior reconstruction completeness and system stability.

Technology Category

Application Category

📝 Abstract
Reliable incremental estimation of camera poses and 3D reconstruction is key to enable various applications including robotics, interactive visualization, and augmented reality. However, this task is particularly challenging in dynamic natural environments, where scene dynamics can severely deteriorate camera pose estimation accuracy. In this work, we propose a novel monocular visual SLAM system that can robustly estimate camera poses in dynamic scenes. To this end, we leverage the complementary strengths of geometric patch-based online bundle adjustment and recent feed-forward reconstruction models. Specifically, we propose a feed-forward reconstruction model to precisely filter out dynamic regions, while also utilizing its depth prediction to enhance the robustness of the patch-based visual SLAM. By aligning depth prediction with estimated patches from bundle adjustment, we robustly handle the inherent scale ambiguities of the batch-wise application of the feed-forward reconstruction model.
Problem

Research questions and friction points this paper is trying to address.

Robust camera pose estimation in dynamic scenes
Filtering dynamic regions using feed-forward reconstruction
Handling scale ambiguities in monocular visual SLAM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular SLAM uses feed-forward model for dynamic filtering
Depth prediction aligns with bundle adjustment patches
Combines geometric and reconstruction models for robustness
🔎 Similar Papers
No similar papers found.