🤖 AI Summary
This paper addresses the challenging problem of dynamic scene reconstruction using only monocular RGB video—without explicit camera pose priors or multi-sensor inputs—for autonomous driving. We propose the first end-to-end dynamic NeRF framework that jointly estimates geometry, appearance, and motion without requiring ground-truth poses or auxiliary sensor data. Our method introduces three key innovations: (1) a point-wise dynamic-static decoupling mechanism guided by semantic segmentation to improve separation fidelity; (2) an optical-flow-guided warping ray consistency loss to enforce geometric and rendering coherence for moving objects; and (3) implicit camera pose optimization regularized by dynamic flow constraints. Evaluated on KITTI and Waymo Open Dataset driving sequences, our approach achieves a 23.6% PSNR gain in dynamic object reconstruction over prior methods, significantly mitigating motion blur while attaining state-of-the-art image sharpness and temporal consistency.
📝 Abstract
Dynamic scene reconstruction for autonomous driving enables vehicles to perceive and interpret complex scene changes more precisely. Dynamic Neural Radiance Fields (NeRFs) have recently shown promising capability in scene modeling. However, many existing methods rely heavily on accurate poses inputs and multi-sensor data, leading to increased system complexity. To address this, we propose FreeDriveRF, which reconstructs dynamic driving scenes using only sequential RGB images without requiring poses inputs. We innovatively decouple dynamic and static parts at the early sampling level using semantic supervision, mitigating image blurring and artifacts. To overcome the challenges posed by object motion and occlusion in monocular camera, we introduce a warped ray-guided dynamic object rendering consistency loss, utilizing optical flow to better constrain the dynamic modeling process. Additionally, we incorporate estimated dynamic flow to constrain the pose optimization process, improving the stability and accuracy of unbounded scene reconstruction. Extensive experiments conducted on the KITTI and Waymo datasets demonstrate the superior performance of our method in dynamic scene modeling for autonomous driving.