🤖 AI Summary
To address geometric distortions and occlusion inconsistencies—two prevalent artifacts in video generation arising from static scene modeling—this paper introduces GeCo, a differentiable geometric consistency metric. GeCo jointly detects both artifacts by integrating residual optical flow with monocular depth priors and performing differentiable geometric reprojection to generate dense, interpretable consistency maps. Its key innovation lies in enabling training-free, model-agnostic guided optimization without fine-tuning or architecture-specific adaptation—overcoming limitations of prior methods that require model-specific tuning or parameter updates. Extensive evaluation across multiple state-of-the-art video generation models demonstrates that GeCo significantly suppresses geometric distortions and improves both geometric fidelity and spatiotemporal coherence of generated videos. By providing a general-purpose, plug-and-play metric for post-hoc analysis and optimization, GeCo establishes a new paradigm for universal video generation refinement.
📝 Abstract
We introduce GeCo, a geometry-grounded metric for jointly detecting geometric deformation and occlusion-inconsistency artifacts in static scenes. By fusing residual motion and depth priors, GeCo produces interpretable, dense consistency maps that reveal these artifacts. We use GeCo to systematically benchmark recent video generation models, uncovering common failure modes, and further employ it as a training-free guidance loss to reduce deformation artifacts during video generation.