SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses dynamic 4D scene reconstruction from multi-view asynchronous videos. We propose the first general-purpose, prior-free 4D Gaussian Splatting framework that requires no pre-defined object categories or geometric priors. Methodologically, we introduce a dense 4D trajectory representation to jointly optimize temporal synchronization and geometric motion reconstruction; employ Gromov–Wasserstein optimal transport for cross-video 4D feature trajectory matching; and integrate motion-spline-guided sub-frame synchronization with Gaussian Splatting rendering. Evaluated on the Panoptic Studio dataset, our method achieves an average temporal alignment error of 0.26 frames and a PSNR of 26.3, significantly improving both 4D reconstruction accuracy and photorealistic rendering quality under asynchrony. Our core contributions are: (1) the first end-to-end, prior-free multi-video 4D Gaussian modeling framework; (2) sub-pixel inter-frame alignment capability; and (3) explicit, differentiable 4D motion modeling.

Technology Category

Application Category

📝 Abstract
Modeling dynamic 3D scenes is challenging due to their high-dimensional nature, which requires aggregating information from multiple views to reconstruct time-evolving 3D geometry and motion. We present a novel multi-video 4D Gaussian Splatting (4DGS) approach designed to handle real-world, unsynchronized video sets. Our approach, SyncTrack4D, directly leverages dense 4D track representation of dynamic scene parts as cues for simultaneous cross-video synchronization and 4DGS reconstruction. We first compute dense per-video 4D feature tracks and cross-video track correspondences by Fused Gromov-Wasserstein optimal transport approach. Next, we perform global frame-level temporal alignment to maximize overlapping motion of matched 4D tracks. Finally, we achieve sub-frame synchronization through our multi-video 4D Gaussian splatting built upon a motion-spline scaffold representation. The final output is a synchronized 4DGS representation with dense, explicit 3D trajectories, and temporal offsets for each video. We evaluate our approach on the Panoptic Studio and SyncNeRF Blender, demonstrating sub-frame synchronization accuracy with an average temporal error below 0.26 frames, and high-fidelity 4D reconstruction reaching 26.3 PSNR scores on the Panoptic Studio dataset. To the best of our knowledge, our work is the first general 4D Gaussian Splatting approach for unsynchronized video sets, without assuming the existence of predefined scene objects or prior models.
Problem

Research questions and friction points this paper is trying to address.

Synchronizes unsynchronized multi-video sets for 4D reconstruction
Aligns cross-video motion to model dynamic 3D scenes
Reconstructs time-evolving geometry without predefined objects or prior models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses dense 4D feature tracks for cross-video synchronization
Applies Fused Gromov-Wasserstein optimal transport for track correspondences
Employs motion-spline scaffold for multi-video 4D Gaussian splatting
🔎 Similar Papers
No similar papers found.