GeCo: A Differentiable Geometric Consistency Metric for Video Generation

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address geometric distortions and occlusion inconsistencies—two prevalent artifacts in video generation arising from static scene modeling—this paper introduces GeCo, a differentiable geometric consistency metric. GeCo jointly detects both artifacts by integrating residual optical flow with monocular depth priors and performing differentiable geometric reprojection to generate dense, interpretable consistency maps. Its key innovation lies in enabling training-free, model-agnostic guided optimization without fine-tuning or architecture-specific adaptation—overcoming limitations of prior methods that require model-specific tuning or parameter updates. Extensive evaluation across multiple state-of-the-art video generation models demonstrates that GeCo significantly suppresses geometric distortions and improves both geometric fidelity and spatiotemporal coherence of generated videos. By providing a general-purpose, plug-and-play metric for post-hoc analysis and optimization, GeCo establishes a new paradigm for universal video generation refinement.

Technology Category

Application Category

📝 Abstract

We introduce GeCo, a geometry-grounded metric for jointly detecting geometric deformation and occlusion-inconsistency artifacts in static scenes. By fusing residual motion and depth priors, GeCo produces interpretable, dense consistency maps that reveal these artifacts. We use GeCo to systematically benchmark recent video generation models, uncovering common failure modes, and further employ it as a training-free guidance loss to reduce deformation artifacts during video generation.

Problem

Research questions and friction points this paper is trying to address.

Detects geometric deformation and occlusion artifacts

Produces interpretable dense consistency maps

Benchmarks video models and reduces deformation artifacts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses residual motion and depth priors

Produces interpretable dense consistency maps

Serves as training-free guidance loss

🔎 Similar Papers

Detecting AI-Generated Video via Frame Consistency