VidSplice: Towards Coherent Video Inpainting via Explicit Spaced Frame Guidance

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing video inpainting methods struggle with severe content degradation, exhibiting spatiotemporal inconsistency and weak control over later frames. Method: This paper proposes a decoupled inpainting framework that separates the task into multi-frame-consistent image inpainting and motion propagation within occluded regions. We introduce an inter-frame prior mechanism, designing the CoSpliced module and a context controller to enable controllable semantic diffusion from the first frame to reference frames and impose deformation constraints during generation. Further, we integrate image-to-video generation priors, frame-copy encoding, stitching guidance, and spatiotemporal feature injection into a diffusion-based backbone. Results: Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across diverse degradation scenarios, particularly achieving superior spatial coherence and motion stability in long-sequence video inpainting.

Technology Category

Application Category

📝 Abstract

Recent video inpainting methods often employ image-to-video (I2V) priors to model temporal consistency across masked frames. While effective in moderate cases, these methods struggle under severe content degradation and tend to overlook spatiotemporal stability, resulting in insufficient control over the latter parts of the video. To address these limitations, we decouple video inpainting into two sub-tasks: multi-frame consistent image inpainting and masked area motion propagation. We propose VidSplice, a novel framework that introduces spaced-frame priors to guide the inpainting process with spatiotemporal cues. To enhance spatial coherence, we design a CoSpliced Module to perform first-frame propagation strategy that diffuses the initial frame content into subsequent reference frames through a splicing mechanism. Additionally, we introduce a delicate context controller module that encodes coherent priors after frame duplication and injects the spliced video into the I2V generative backbone, effectively constraining content distortion during generation. Extensive evaluations demonstrate that VidSplice achieves competitive performance across diverse video inpainting scenarios. Moreover, its design significantly improves both foreground alignment and motion stability, outperforming existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Addresses video inpainting challenges under severe content degradation

Enhances spatiotemporal stability and coherence in inpainted videos

Improves foreground alignment and motion stability in video generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples video inpainting into two sub-tasks

Introduces spaced-frame priors for spatiotemporal guidance

Uses CoSpliced Module for first-frame propagation strategy

🔎 Similar Papers

No similar papers found.