ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

202K/year

🤖 AI Summary

Existing single-frame-to-video diffusion models for two-frame-constrained video keyframe interpolation suffer from out-of-manifold generation and artifacts due to parallel bidirectional sampling, while relying on multiple denoising iterations. Method: We propose a sequential bidirectional diffusion sampling strategy—enabling the first single-pass forward–backward cascaded sampling—to eliminate redundant denoising. Our approach integrates CFG++ classifier-free guidance with DDS (Dynamic Denoising Scheduling) to enhance temporal coherence and in-manifold generation fidelity. Contribution/Results: On a single RTX 3090 GPU, our method generates high-quality 25-frame interpolated videos at 1024×576 resolution in just 195 seconds, achieving state-of-the-art performance in keyframe interpolation while significantly improving efficiency and visual quality.

Technology Category

Application Category

📝 Abstract

Recent progress in large-scale text-to-video (T2V) and image-to-video (I2V) diffusion models has greatly enhanced video generation, especially in terms of keyframe interpolation. However, current image-to-video diffusion models, while powerful in generating videos from a single conditioning frame, need adaptation for two-frame (start&end) conditioned generation, which is essential for effective bounded interpolation. Unfortunately, existing approaches that fuse temporally forward and backward paths in parallel often suffer from off-manifold issues, leading to artifacts or requiring multiple iterative re-noising steps. In this work, we introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning. Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames. Additionally, we incorporate advanced guidance techniques, CFG++ and DDS, to further enhance the interpolation process. By integrating these, our method achieves state-of-the-art performance, efficiently generating high-quality, smooth videos between keyframes. On a single 3090 GPU, our method can interpolate 25 frames at 1024 x 576 resolution in just 195 seconds, establishing it as a leading solution for keyframe interpolation.

Problem

Research questions and friction points this paper is trying to address.

Enhance video interpolation using bidirectional diffusion sampling.

Address off-manifold issues in two-frame conditioned video generation.

Improve quality and efficiency of intermediate frame generation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional sampling for coherent frame generation

Advanced guidance techniques CFG++ and DDS

Efficient 25-frame interpolation in 195 seconds

🔎 Similar Papers

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation