DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing trajectory-controlled video generation methods suffer significant performance degradation under large-viewpoint camera motions and rely on costly, specially annotated video pairs. To address these limitations, this work proposes the DEVIS-GRPO framework, which introduces online policy gradient optimization (GRPO) to extreme-viewpoint video generation for the first time. It features an Accumulative Dynamic Extreme-View Synthesis (ADEVIS) strategy that enables efficient training without pre-collected large-viewpoint paired videos, thereby enhancing trajectory diversity. Additionally, a multi-level consistency-quality reward function is incorporated to optimize generation fidelity. Evaluated on the Kubric-4D and iPhone datasets, the proposed method achieves a 21.57% improvement in PSNR, a 7.31% gain in SSIM, and an 18.56% reduction in LPIPS, substantially outperforming current state-of-the-art approaches.

📝 Abstract

Trajectory-controlled video generation has become essential for controllable video generation. While current methods perform well under small-view camera motions, they degrade significantly with large-view motions. Existing solutions for extreme-view synthesis typically require dedicated video pairs, demanding substantial annotation effort. To address these limitations, we propose Dynamic Extreme VIew Synthesis-GRPO (DEVIS-GRPO), a GRPO-based framework for trajectory-controlled video generation, the first online policy gradient method for extreme view video generation. Central to our approach is a novel sampling strategy: Accumulative Dynamic Extreme VIew Synthesis (ADEVIS), which achieves large-view camera motions by progressively accumulating small-view increments. This method delivers two key advantages: 1) enhanced training efficiency, as it eliminates the need to warm-start the policy model by collecting expensive paired large-view videos, and 2) increased sampling diversity, achieved by flexibly varying trajectory configurations. Finally, we designed a multi-level consistency-quality reward function to select high-quality samples for model optimization. Experiments on the Kubric-4D, iPhone, and DL3DV datasets demonstrate our method's superiority. On Kubric-4D, we achieve relative improvements of 21.57% in PSNR and 7.31% in SSIM over the second-best method in non-occlusion areas. On iPhone, LPIPS is reduced by 18.56%.

Problem

Research questions and friction points this paper is trying to address.

trajectory-controlled video generation

extreme view synthesis

large-view camera motion

video generation

annotation effort

Innovation

Methods, ideas, or system contributions that make the work stand out.

GRPO

Extreme View Synthesis

Trajectory-controlled Video Generation