Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

📅 2024-08-27
🏛️ arXiv.org
📈 Citations: 7
Influential: 3
📄 PDF
🤖 AI Summary
To address motion discontinuity and poor temporal consistency in keyframe-based video interpolation, this paper proposes a lightweight bidirectional diffusion sampling framework. Without retraining large-scale models, it fine-tunes pre-trained image-to-video diffusion models (e.g., Sora-like architectures) to enable bidirectional temporal modeling. The method initiates collaborative sampling from both end keyframes and introduces an overlapping estimation fusion strategy to enhance motion plausibility and structural fidelity of intermediate frames. To our knowledge, this is the first work to efficiently adapt unidirectional image-to-video diffusion models for keyframe interpolation. Extensive experiments demonstrate that our approach significantly outperforms optical-flow-based methods and existing diffusion-based interpolation techniques across multiple benchmarks, achieving state-of-the-art performance in visual quality, motion smoothness, and temporal consistency.

Technology Category

Application Category

📝 Abstract
We present a method for generating video sequences with coherent motion between a pair of input key frames. We adapt a pretrained large-scale image-to-video diffusion model (originally trained to generate videos moving forward in time from a single input image) for key frame interpolation, i.e., to produce a video in between two input frames. We accomplish this adaptation through a lightweight fine-tuning technique that produces a version of the model that instead predicts videos moving backwards in time from a single input image. This model (along with the original forward-moving model) is subsequently used in a dual-directional diffusion sampling process that combines the overlapping model estimates starting from each of the two keyframes. Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame interpolation techniques.
Problem

Research questions and friction points this paper is trying to address.

Generating video sequences between keyframes
Adapting image-to-video models
Dual-directional diffusion sampling process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Keyframe interpolation technique
Dual-directional diffusion sampling
Lightweight fine-tuning adaptation
🔎 Similar Papers
No similar papers found.