GRVS: a Generalizable and Recurrent Approach to Monocular Dynamic View Synthesis

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenges of novel view synthesis in monocular dynamic scenes, including geometric inconsistency, high computational cost, and failure in reconstructing dynamic regions. The authors propose the first neural rendering method that simultaneously achieves generality, a recurrent architecture, and efficient dynamic plane sweeping. By leveraging a recurrent structure, the approach enables unbounded asynchronous mapping between input and target video frames, while dynamic plane sweeping decouples camera motion from scene dynamics, facilitating precise six-degree-of-freedom camera control. Evaluated on both the UCSD dataset and a newly introduced Kubric-4D-dyn benchmark, the method significantly outperforms existing Gaussian splatting and diffusion-based models, achieving superior geometric consistency and reconstruction quality in both static and dynamic scene regions.

Technology Category

Application Category

📝 Abstract

Synthesizing novel views from monocular videos of dynamic scenes remains a challenging problem. Scene-specific methods that optimize 4D representations with explicit motion priors often break down in highly dynamic regions where multi-view information is hard to exploit. Diffusion-based approaches that integrate camera control into large pre-trained models can produce visually plausible videos but frequently suffer from geometric inconsistencies across both static and dynamic areas. Both families of methods also require substantial computational resources. Building on the success of generalizable models for static novel view synthesis, we adapt the framework to dynamic inputs and propose a new model with two key components: (1) a recurrent loop that enables unbounded and asynchronous mapping between input and target videos and (2) an efficient use of plane sweeps over dynamic inputs to disentangle camera and scene motion, and achieve fine-grained, six-degrees-of-freedom camera controls. We train and evaluate our model on the UCSD dataset and on Kubric-4D-dyn, a new monocular dynamic dataset featuring longer, higher resolution sequences with more complex scene dynamics than existing alternatives. Our model outperforms four Gaussian Splatting-based scene-specific approaches, as well as two diffusion-based approaches in reconstructing fine-grained geometric details across both static and dynamic regions.

Problem

Research questions and friction points this paper is trying to address.

monocular dynamic view synthesis

novel view synthesis

dynamic scenes

geometric consistency

camera control

Innovation

Methods, ideas, or system contributions that make the work stand out.

monocular dynamic view synthesis

recurrent modeling

plane sweeps