FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving via Point-Level Dynamic-Static Decoupling

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This paper addresses the challenging problem of dynamic scene reconstruction using only monocular RGB video—without explicit camera pose priors or multi-sensor inputs—for autonomous driving. We propose the first end-to-end dynamic NeRF framework that jointly estimates geometry, appearance, and motion without requiring ground-truth poses or auxiliary sensor data. Our method introduces three key innovations: (1) a point-wise dynamic-static decoupling mechanism guided by semantic segmentation to improve separation fidelity; (2) an optical-flow-guided warping ray consistency loss to enforce geometric and rendering coherence for moving objects; and (3) implicit camera pose optimization regularized by dynamic flow constraints. Evaluated on KITTI and Waymo Open Dataset driving sequences, our approach achieves a 23.6% PSNR gain in dynamic object reconstruction over prior methods, significantly mitigating motion blur while attaining state-of-the-art image sharpness and temporal consistency.

Technology Category

Application Category

📝 Abstract

Dynamic scene reconstruction for autonomous driving enables vehicles to perceive and interpret complex scene changes more precisely. Dynamic Neural Radiance Fields (NeRFs) have recently shown promising capability in scene modeling. However, many existing methods rely heavily on accurate poses inputs and multi-sensor data, leading to increased system complexity. To address this, we propose FreeDriveRF, which reconstructs dynamic driving scenes using only sequential RGB images without requiring poses inputs. We innovatively decouple dynamic and static parts at the early sampling level using semantic supervision, mitigating image blurring and artifacts. To overcome the challenges posed by object motion and occlusion in monocular camera, we introduce a warped ray-guided dynamic object rendering consistency loss, utilizing optical flow to better constrain the dynamic modeling process. Additionally, we incorporate estimated dynamic flow to constrain the pose optimization process, improving the stability and accuracy of unbounded scene reconstruction. Extensive experiments conducted on the KITTI and Waymo datasets demonstrate the superior performance of our method in dynamic scene modeling for autonomous driving.

Problem

Research questions and friction points this paper is trying to address.

Reconstruct dynamic driving scenes without pose inputs

Decouple dynamic and static parts using semantic supervision

Improve dynamic modeling with optical flow constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular RGB NeRF without pose inputs

Point-level dynamic-static decoupling via semantics

Warped ray-guided dynamic rendering consistency

🔎 Similar Papers

TivNe-SLAM: Dynamic Mapping and Tracking via Time-Varying Neural Radiance Fields