Time-Correlated Video Bridge Matching

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models excel at noise-to-data generation but face limitations in cross-distribution temporal data transformation—such as video editing—particularly in preserving temporal coherence. This work introduces Video Bridge Matching, the first extension of Bridge Matching to sequential data, explicitly modeling inter-frame temporal dependencies for high-fidelity video generation and editing. Our framework integrates the diffusion bridge mechanism with dedicated temporal correlation modeling, enabling robust learning of dynamic structure across frames. We demonstrate state-of-the-art performance on frame interpolation, image-to-video synthesis, and video super-resolution, achieving significant improvements in PSNR (+2.1 dB), SSIM (+0.045), and perceptual quality over conventional diffusion and GAN-based baselines. Crucially, Video Bridge Matching resolves the long-standing challenge of temporal consistency in cross-distribution video translation, establishing a new paradigm for structured spatiotemporal generative modeling.

Technology Category

Application Category

📝 Abstract
Diffusion models excel in noise-to-data generation tasks, providing a mapping from a Gaussian distribution to a more complex data distribution. However they struggle to model translations between complex distributions, limiting their effectiveness in data-to-data tasks. While Bridge Matching (BM) models address this by finding the translation between data distributions, their application to time-correlated data sequences remains unexplored. This is a critical limitation for video generation and manipulation tasks, where maintaining temporal coherence is particularly important. To address this gap, we propose Time-Correlated Video Bridge Matching (TCVBM), a framework that extends BM to time-correlated data sequences in the video domain. TCVBM explicitly models inter-sequence dependencies within the diffusion bridge, directly incorporating temporal correlations into the sampling process. We compare our approach to classical methods based on bridge matching and diffusion models for three video-related tasks: frame interpolation, image-to-video generation, and video super-resolution. TCVBM achieves superior performance across multiple quantitative metrics, demonstrating enhanced generation quality and reconstruction fidelity.
Problem

Research questions and friction points this paper is trying to address.

Modeling translations between complex data distributions
Extending Bridge Matching to time-correlated video sequences
Maintaining temporal coherence in video generation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends bridge matching to time-correlated video sequences
Models inter-sequence dependencies within diffusion bridge framework
Incorporates temporal correlations directly into sampling process
🔎 Similar Papers
No similar papers found.