🤖 AI Summary
Existing video re-rendering methods often suffer from depth estimation artifacts in real-world dynamic scenes, struggling to simultaneously achieve appearance consistency and precise camera control. This work proposes a 4D point cloud–based re-rendering framework that anchors both the input video and the target camera trajectory into a unified 4D point cloud representation, explicitly preserving observed content while providing rich geometric priors. By integrating static pixel segmentation with multi-view dynamic reconstruction, the method significantly enhances robustness against real-world point cloud artifacts. Experiments demonstrate that the proposed framework consistently outperforms existing approaches across diverse videos and camera paths, achieving notable improvements in 4D temporal consistency, camera control accuracy, and visual fidelity, and successfully generalizes to complex real-world scenarios.
📝 Abstract
We present Vista4D, a robust and flexible video reshooting framework that grounds the input video and target cameras in a 4D point cloud. Specifically, given an input video, our method re-synthesizes the scene with the same dynamics from a different camera trajectory and viewpoint. Existing video reshooting methods often struggle with depth estimation artifacts of real-world dynamic videos, while also failing to preserve content appearance and failing to maintain precise camera control for challenging new trajectories. We build a 4D-grounded point cloud representation with static pixel segmentation and 4D reconstruction to explicitly preserve seen content and provide rich camera signals, and we train with reconstructed multiview dynamic data for robustness against point cloud artifacts during real-world inference. Our results demonstrate improved 4D consistency, camera control, and visual quality compared to state-of-the-art baselines under a variety of videos and camera paths. Moreover, our method generalizes to real-world applications such as dynamic scene expansion and 4D scene recomposition. See our project page for results, code, and models: https://eyeline-labs.github.io/Vista4D