IN2OUT: Fine-Tuning Video Inpainting Model for Video Outpainting Using Hierarchical Discriminator

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Video outpainting suffers from poor content consistency and blurry artifacts when extending video frames beyond their original boundaries. To address these challenges, we propose a high-fidelity outpainting method leveraging transfer learning from pretrained video inpainting models. First, we fine-tune a strong video inpainting model for outpainting, capitalizing on its inherent capacity for modeling object motion and reconstructing coherent spatiotemporal structures. Second, we introduce a hierarchical discriminator that separately enforces adversarial learning for global layout and local texture fidelity. Third, we design a dedicated multi-scale fusion loss that jointly optimizes temporal coherence, structural integrity, and high-frequency detail preservation. Extensive experiments on multiple benchmarks demonstrate that our approach consistently surpasses state-of-the-art methods, achieving superior performance in both quantitative metrics (PSNR, SSIM, LPIPS) and qualitative visual quality. Code and interactive demos are publicly available.

Technology Category

Application Category

📝 Abstract

Video outpainting presents a unique challenge of extending the borders while maintaining consistency with the given content. In this paper, we suggest the use of video inpainting models that excel in object flow learning and reconstruction in outpainting rather than solely generating the background as in existing methods. However, directly applying or fine-tuning inpainting models to outpainting has shown to be ineffective, often leading to blurry results. Our extensive experiments on discriminator designs reveal that a critical component missing in the outpainting fine-tuning process is a discriminator capable of effectively assessing the perceptual quality of the extended areas. To tackle this limitation, we differentiate the objectives of adversarial training into global and local goals and introduce a hierarchical discriminator that meets both objectives. Additionally, we develop a specialized outpainting loss function that leverages both local and global features of the discriminator. Fine-tuning on this adversarial loss function enhances the generator's ability to produce both visually appealing and globally coherent outpainted scenes. Our proposed method outperforms state-of-the-art methods both quantitatively and qualitatively. Supplementary materials including the demo video and the code are available in SigPort.

Problem

Research questions and friction points this paper is trying to address.

Extending video borders while maintaining content consistency

Improving blurry results from inpainting models in outpainting

Developing a hierarchical discriminator for perceptual quality assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical discriminator for global and local goals

Specialized outpainting loss using discriminator features

Fine-tuning video inpainting models for outpainting

🔎 Similar Papers

No similar papers found.