Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the reliance of monocular video NeRF reconstruction on accurate initial camera poses or depth priors. We propose a pose-initialization-free joint optimization framework. Methodologically, we model camera motion as time-varying angular and linear velocities, integrating the rigid-body motion differential equations to obtain globally consistent camera poses. We further introduce a time-varying NeRF, coupled with local geometric consistency constraints across adjacent frames, enabling co-optimization of camera motion and scene geometry. Our key contribution is the first incorporation of a continuous velocity-field parameterization into NeRF joint optimization—eliminating the need for pose initialization or depth priors entirely. Experiments on Co3D and ScanNet demonstrate substantial improvements in pose and depth estimation accuracy, while novel-view synthesis quality achieves state-of-the-art performance. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Neural Radiance Fields (NeRF) has demonstrated its superior capability to represent 3D geometry but require accurately precomputed camera poses during training. To mitigate this requirement, existing methods jointly optimize camera poses and NeRF often relying on good pose initialisation or depth priors. However, these approaches struggle in challenging scenarios, such as large rotations, as they map each camera to a world coordinate system. We propose a novel method that eliminates prior dependencies by modeling continuous camera motions as time-dependent angular velocity and velocity. Relative motions between cameras are learned first via velocity integration, while camera poses can be obtained by aggregating such relative motions up to a world coordinate system defined at a single time step within the video. Specifically, accurate continuous camera movements are learned through a time-dependent NeRF, which captures local scene geometry and motion by training from neighboring frames for each time step. The learned motions enable fine-tuning the NeRF to represent the full scene geometry. Experiments on Co3D and Scannet show our approach achieves superior camera pose and depth estimation and comparable novel-view synthesis performance compared to state-of-the-art methods. Our code is available at https://github.com/HoangChuongNguyen/cope-nerf.
Problem

Research questions and friction points this paper is trying to address.

Optimizing camera poses and NeRF jointly without prior dependencies
Modeling continuous camera motions as time-dependent velocities
Improving camera pose and depth estimation in challenging scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models continuous camera motions as time-dependent velocities
Learns relative motions via velocity integration first
Uses time-dependent NeRF for accurate continuous movements
🔎 Similar Papers
No similar papers found.
H
Hoang Chuong Nguyen
Australian National University
W
Wei Mao
J
Jose M. Alvarez
NVIDIA
Miaomiao Liu
Miaomiao Liu
Australian National University
Computer VisionMachine Learning