🤖 AI Summary
This paper addresses the problem of high-precision automatic synchronization of multi-camera video streams from consumer-grade cameras under uncontrolled environments, without dedicated hardware or manual intervention. We propose VisualSync, the first framework to jointly model generic 3D reconstruction, cross-view feature matching, and dense motion trajectory tracking—leveraging epipolar geometry constraints from co-visible dynamic objects to directly estimate millisecond-level inter-camera time offsets via end-to-end optimization. The method relies entirely on off-the-shelf algorithms, requiring no camera calibration, external synchronization signals, or scene instrumentation. Evaluated on four diverse real-world datasets, VisualSync achieves a median synchronization error below 50 ms, significantly outperforming existing baselines. It establishes a scalable, calibration-free synchronization paradigm for low-cost multi-view motion analysis and collaborative perception.
📝 Abstract
Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized. To exploit this, VisualSync leverages off-the-shelf 3D reconstruction, feature matching, and dense tracking to extract tracklets, relative poses, and cross-view correspondences. It then jointly minimizes the epipolar error to estimate each camera's time offset. Experiments on four diverse, challenging datasets show that VisualSync outperforms baseline methods, achieving an median synchronization error below 50 ms.