🤖 AI Summary
This work addresses the challenge of preserving both fine detail fidelity and temporal motion consistency in video frame interpolation under high upscaling factors (e.g., 4×/8×) and high resolutions (2560×1440), where existing methods often suffer from structural distortions or motion incoherence. To this end, the authors propose FC-VFI, a novel approach leveraging a pre-trained video diffusion model to model temporal dependencies in latent space, thereby retaining structural details from the input frames. The method incorporates semantic matching lines to provide structure-aware motion guidance and introduces a temporal differential loss to enhance temporal consistency. Extensive experiments demonstrate that FC-VFI successfully interpolates 30 FPS videos to 120/240 FPS with superior visual quality, significantly outperforming state-of-the-art methods across diverse scenarios while maintaining high structural integrity and perceptual fidelity.
📝 Abstract
Large pre-trained video diffusion models excel in video frame interpolation but struggle to generate high fidelity frames due to reliance on intrinsic generative priors, limiting detail preservation from start and end frames. Existing methods often depend on motion control for temporal consistency, yet dense optical flow is error-prone, and sparse points lack structural context. In this paper, we propose FC-VFI for faithful and consistent video frame interpolation, supporting \(4\times\)x and \(8\times\) interpolation, boosting frame rates from 30 FPS to 120 and 240 FPS at \(2560\times 1440\)resolution while preserving visual fidelity and motion consistency. We introduce a temporal modeling strategy on the latent sequences to inherit fidelity cues from start and end frames and leverage semantic matching lines for structure-aware motion guidance, improving motion consistency. Furthermore, we propose a temporal difference loss to mitigate temporal inconsistencies. Extensive experiments show FC-VFI achieves high performance and structural integrity across diverse scenarios.