🤖 AI Summary
Existing video inpainting methods struggle with severe content degradation, exhibiting spatiotemporal inconsistency and weak control over later frames.
Method: This paper proposes a decoupled inpainting framework that separates the task into multi-frame-consistent image inpainting and motion propagation within occluded regions. We introduce an inter-frame prior mechanism, designing the CoSpliced module and a context controller to enable controllable semantic diffusion from the first frame to reference frames and impose deformation constraints during generation. Further, we integrate image-to-video generation priors, frame-copy encoding, stitching guidance, and spatiotemporal feature injection into a diffusion-based backbone.
Results: Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across diverse degradation scenarios, particularly achieving superior spatial coherence and motion stability in long-sequence video inpainting.
📝 Abstract
Recent video inpainting methods often employ image-to-video (I2V) priors to model temporal consistency across masked frames. While effective in moderate cases, these methods struggle under severe content degradation and tend to overlook spatiotemporal stability, resulting in insufficient control over the latter parts of the video. To address these limitations, we decouple video inpainting into two sub-tasks: multi-frame consistent image inpainting and masked area motion propagation. We propose VidSplice, a novel framework that introduces spaced-frame priors to guide the inpainting process with spatiotemporal cues. To enhance spatial coherence, we design a CoSpliced Module to perform first-frame propagation strategy that diffuses the initial frame content into subsequent reference frames through a splicing mechanism. Additionally, we introduce a delicate context controller module that encodes coherent priors after frame duplication and injects the spliced video into the I2V generative backbone, effectively constraining content distortion during generation. Extensive evaluations demonstrate that VidSplice achieves competitive performance across diverse video inpainting scenarios. Moreover, its design significantly improves both foreground alignment and motion stability, outperforming existing approaches.