Plenoptic Video Generation

๐Ÿ“… 2026-01-08
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of spatiotemporal inconsistency in multi-view video generation, particularly in hallucinated regions caused by model stochasticity. To this end, the authors propose PlenopticDreamer, a framework that employs an autoregressively trained multi-input single-output video conditional model, coupled with a camera-guided video retrieval strategy to dynamically select previously generated salient videos as conditioning inputs. The method further introduces a spatiotemporal memory synchronization mechanism that integrates progressive context scaling, self-conditioning enhancement, and long-video conditioning to effectively mitigate error accumulation. Evaluated on the Basic and Agibot benchmarks, PlenopticDreamer achieves state-of-the-art performance, enabling high-fidelity video re-rendering with precise camera control, strong cross-view consistency, and diverse viewpoint transitionsโ€”such as from third-person to robotic gripper perspectives.

Technology Category

Application Category

๐Ÿ“ Abstract
Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view setting, these works often struggle to maintain consistency across multi-view scenarios. Ensuring spatio-temporal coherence in hallucinated regions remains challenging due to the inherent stochasticity of generative models. To address it, we introduce PlenopticDreamer, a framework that synchronizes generative hallucinations to maintain spatio-temporal memory. The core idea is to train a multi-in-single-out video-conditioned model in an autoregressive manner, aided by a camera-guided video retrieval strategy that adaptively selects salient videos from previous generations as conditional inputs. In addition, Our training incorporates progressive context-scaling to improve convergence, self-conditioning to enhance robustness against long-range visual degradation caused by error accumulation, and a long-video conditioning mechanism to support extended video generation. Extensive experiments on the Basic and Agibot benchmarks demonstrate that PlenopticDreamer achieves state-of-the-art video re-rendering, delivering superior view synchronization, high-fidelity visuals, accurate camera control, and diverse view transformations (e.g., third-person to third-person, and head-view to gripper-view in robotic manipulation). Project page: https://research.nvidia.com/labs/dir/plenopticdreamer/
Problem

Research questions and friction points this paper is trying to address.

multi-view consistency
spatio-temporal coherence
generative video re-rendering
hallucinated regions
camera-controlled generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

plenoptic video generation
spatio-temporal coherence
camera-guided retrieval
self-conditioning
autoregressive video modeling
๐Ÿ”Ž Similar Papers
No similar papers found.