🤖 AI Summary
Existing video generation models often suffer from spatial distortions due to inconsistent 3D geometry, yet prevailing evaluation metrics such as FVD struggle to distinguish plausible motion dynamics from geometric inaccuracies. To address this gap, this work proposes the Spatial Geometric Consistency (SGC) metric, which segments static and dynamic regions in generated videos and enforces 3D consistency by partitioning the static background into spatially coherent subregions. For each subregion, SGC estimates depth and local camera poses, then quantifies geometric fidelity through multi-view pose divergence. SGC is the first metric capable of accurately detecting geometric distortions while preserving sensitivity to realistic dynamic content. It demonstrates superior robustness and discriminative power over existing methods on both real-world and synthetic videos, effectively addressing a critical limitation in current video generation evaluation frameworks.
📝 Abstract
Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.