Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Existing video outpainting methods struggle to maintain intra- and inter-frame consistency in dynamic scenes and large-scale extrapolations due to implicit temporal modeling and limited spatial context. This work presents the first unified framework that integrates propagation and generation paradigms by introducing a latent propagation mechanism combining optical flow-based propagation with reference-guided synthesis, thereby preserving original visible content while producing spatiotemporally coherent and photorealistic outpainted results. We incorporate a pre-trained optical flow completion network and jointly optimize it within an end-to-end fine-tuned diffusion-based generative framework, significantly enhancing temporal consistency and generation reliability. Experiments demonstrate that our approach outperforms state-of-the-art methods in visual realism, temporal coherence, and inference efficiency, without requiring input-specific adaptation.

Technology Category

Application Category

📝 Abstract

Video outpainting aims to expand the visible content of a video beyond the original frame boundaries while preserving spatial fidelity and temporal coherence across frames. Existing methods primarily rely on large-scale generative models, such as diffusion models. However, generationbased approaches suffer from implicit temporal modeling and limited spatial context. These limitations lead to intraframe and inter-frame inconsistencies, which become particularly pronounced in dynamic scenes and large outpainting scenarios. To overcome these challenges, we propose Seen-to-Scene, a novel framework that unifies propagationbased and generation-based paradigms for video outpainting. Specifically, Seen-to-Scene leverages flow-based propagation with a flow completion network pre-trained for video inpainting, which is fine-tuned in an end-to-end manner to bridge the domain gap and reconstruct coherent motion fields. To further improve the efficiency and reliability of propagation, we introduce a reference-guided latent propagation that effectively propagates source content across frames. Extensive experiments demonstrate that our method achieves superior temporal coherence and visual realism with efficient inference, surpassing even prior state-of-the-art methods that require input-specific adaptation.

Problem

Research questions and friction points this paper is trying to address.

video outpainting

temporal coherence

spatial fidelity

dynamic scenes

frame consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

video outpainting

flow-based propagation

temporal coherence