VidSplice: Towards Coherent Video Inpainting via Explicit Spaced Frame Guidance

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video inpainting methods struggle with severe content degradation, exhibiting spatiotemporal inconsistency and weak control over later frames. Method: This paper proposes a decoupled inpainting framework that separates the task into multi-frame-consistent image inpainting and motion propagation within occluded regions. We introduce an inter-frame prior mechanism, designing the CoSpliced module and a context controller to enable controllable semantic diffusion from the first frame to reference frames and impose deformation constraints during generation. Further, we integrate image-to-video generation priors, frame-copy encoding, stitching guidance, and spatiotemporal feature injection into a diffusion-based backbone. Results: Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across diverse degradation scenarios, particularly achieving superior spatial coherence and motion stability in long-sequence video inpainting.

Technology Category

Application Category

📝 Abstract
Recent video inpainting methods often employ image-to-video (I2V) priors to model temporal consistency across masked frames. While effective in moderate cases, these methods struggle under severe content degradation and tend to overlook spatiotemporal stability, resulting in insufficient control over the latter parts of the video. To address these limitations, we decouple video inpainting into two sub-tasks: multi-frame consistent image inpainting and masked area motion propagation. We propose VidSplice, a novel framework that introduces spaced-frame priors to guide the inpainting process with spatiotemporal cues. To enhance spatial coherence, we design a CoSpliced Module to perform first-frame propagation strategy that diffuses the initial frame content into subsequent reference frames through a splicing mechanism. Additionally, we introduce a delicate context controller module that encodes coherent priors after frame duplication and injects the spliced video into the I2V generative backbone, effectively constraining content distortion during generation. Extensive evaluations demonstrate that VidSplice achieves competitive performance across diverse video inpainting scenarios. Moreover, its design significantly improves both foreground alignment and motion stability, outperforming existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Addresses video inpainting challenges under severe content degradation
Enhances spatiotemporal stability and coherence in inpainted videos
Improves foreground alignment and motion stability in video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples video inpainting into two sub-tasks
Introduces spaced-frame priors for spatiotemporal guidance
Uses CoSpliced Module for first-frame propagation strategy
🔎 Similar Papers
No similar papers found.