DynPoint: Dynamic Neural Point For View Synthesis

📅 2023-10-29

🏛️ Neural Information Processing Systems

📈 Citations: 13

✨ Influential: 1

career value

213K/year

🤖 AI Summary

To address the slow training, poor generalization, and low robustness to uncurated monocular long videos inherent in NeRF-based methods for novel view synthesis, this paper proposes an explicit dynamic modeling framework. Instead of relying on global implicit representations, our approach jointly estimates depth and scene flow to dynamically construct hierarchical neural point clouds, and enables inter-frame feature aggregation via explicit 3D correspondences across frames. Notably, we are the first to introduce explicit dynamic 3D correspondence into neural point cloud-based view synthesis—eliminating the need for learning video-canonical representations. Our method accelerates training by approximately 10× while achieving state-of-the-art synthesis quality. It maintains stable performance on long-duration and uncurated monocular video sequences, significantly improving temporal robustness and generalization capability.

📝 Abstract

The introduction of neural radiance fields has greatly improved the effectiveness of view synthesis for monocular videos. However, existing algorithms face difficulties when dealing with uncontrolled or lengthy scenarios, and require extensive training time specific to each new scenario. To tackle these limitations, we propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos. Rather than encoding the entirety of the scenario information into a latent representation, DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation. Specifically, this correspondence prediction is achieved through the estimation of consistent depth and scene flow information across frames. Subsequently, the acquired correspondence is utilized to aggregate information from multiple reference frames to a target frame, by constructing hierarchical neural point clouds. The resulting framework enables swift and accurate view synthesis for desired views of target frames. The experimental results obtained demonstrate the considerable acceleration of training time achieved - typically an order of magnitude - by our proposed method while yielding comparable outcomes compared to prior approaches. Furthermore, our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.

Problem

Research questions and friction points this paper is trying to address.

Neural Radiance Fields

Complex Video Processing

Monocular Video Novel View Synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

DynPoint

3D correspondence

Hierarchical point cloud

🔎 Similar Papers

No similar papers found.