VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video diffusion models often suffer from object deformations or spatial drift due to the absence of explicit 3D structural constraints. This work proposes a self-supervised framework that, for the first time, incorporates geometric priors into video generation training in the form of preference pairs. By leveraging a foundation model for geometry estimation, the method generates dense 3D consistency signals as preference labels and employs Direct Preference Optimization (DPO) to guide the diffusion model toward learning more physically plausible and temporally coherent spatiotemporal distributions. Notably, the approach requires no manual annotations and significantly enhances temporal stability, physical realism, and motion coherence in generated videos, outperforming current state-of-the-art methods across multiple evaluation metrics.

Technology Category

Application Category

📝 Abstract
While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, physical plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.
Problem

Research questions and friction points this paper is trying to address.

3D consistency
video generation
geometric coherence
temporal stability
spatial drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

VideoGPA
3D consistency
geometry prior
Direct Preference Optimization
video diffusion models
🔎 Similar Papers
No similar papers found.