Time-adaptive Video Frame Interpolation based on Residual Diffusion

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of inaccurate interpolation time estimation in traditional hand-drawn animation video frame interpolation (VFI), primarily caused by abrupt motion changes. To this end, we propose a time-adaptive lightweight residual diffusion framework. Methodologically, it explicitly models and adaptively learns the optimal interpolation time—the first such approach in VFI—and integrates the ResShift diffusion paradigm, enabling high-fidelity intermediate frame synthesis in only ~10 denoising steps. Furthermore, it incorporates optical flow-guided feature alignment and pixel-wise uncertainty modeling to yield interpretable confidence maps. Extensive experiments on animation benchmarks demonstrate significant improvements over state-of-the-art methods, achieving an effective balance among high-fidelity interpolation quality, rapid inference (≈10 steps), and physically grounded interpretability.

Technology Category

Application Category

📝 Abstract
In this work, we propose a new diffusion-based method for video frame interpolation (VFI), in the context of traditional hand-made animation. We introduce three main contributions: The first is that we explicitly handle the interpolation time in our model, which we also re-estimate during the training process, to cope with the particularly large variations observed in the animation domain, compared to natural videos; The second is that we adapt and generalize a diffusion scheme called ResShift recently proposed in the super-resolution community to VFI, which allows us to perform a very low number of diffusion steps (in the order of 10) to produce our estimates; The third is that we leverage the stochastic nature of the diffusion process to provide a pixel-wise estimate of the uncertainty on the interpolated frame, which could be useful to anticipate where the model may be wrong. We provide extensive comparisons with respect to state-of-the-art models and show that our model outperforms these models on animation videos.
Problem

Research questions and friction points this paper is trying to address.

Handling large time variations in animation frame interpolation
Adapting ResShift diffusion for efficient video frame interpolation
Providing pixel-wise uncertainty estimates for interpolated frames
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-adaptive interpolation handling large animation variations
ResShift diffusion scheme for low-step VFI
Pixel-wise uncertainty estimation via diffusion stochasticity