🤖 AI Summary
Addressing the challenge of multi-video synchronization across diverse scenarios and generative AI videos—complicated by thematic and background disparities as well as nonlinear temporal misalignment—this paper introduces the first time-alignment framework specifically designed for multiple generative videos. Our method constructs a shared, one-dimensional temporal prototype sequence: high-dimensional frame embeddings are extracted via pretrained vision models and compressed into compact, comparable temporal representations through prototype learning, thereby eliminating the computational overhead of pairwise matching and enabling unified anchoring of key action phases. The framework supports fine-grained frame retrieval and phase classification. Extensive experiments demonstrate significant improvements in synchronization accuracy, robustness, and efficiency across multiple benchmarks. To foster reproducibility and further research, we release both the source code and a newly curated benchmark dataset for generative video synchronization.
📝 Abstract
Synchronizing videos captured simultaneously from multiple cameras in the same scene is often easy and typically requires only simple time shifts. However, synchronizing videos from different scenes or, more recently, generative AI videos, poses a far more complex challenge due to diverse subjects, backgrounds, and nonlinear temporal misalignment. We propose Temporal Prototype Learning (TPL), a prototype-based framework that constructs a shared, compact 1D representation from high-dimensional embeddings extracted by any of various pretrained models. TPL robustly aligns videos by learning a unified prototype sequence that anchors key action phases, thereby avoiding exhaustive pairwise matching. Our experiments show that TPL improves synchronization accuracy, efficiency, and robustness across diverse datasets, including fine-grained frame retrieval and phase classification tasks. Importantly, TPL is the first approach to mitigate synchronization issues in multiple generative AI videos depicting the same action. Our code and a new multiple video synchronization dataset are available at https://bgu-cs-vil.github.io/TPL/