DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

📅 2024-07-01
🏛️ arXiv.org
📈 Citations: 4
Influential: 2
📄 PDF
🤖 AI Summary
To address temporal inconsistency arising from direct transfer of pretrained image diffusion inpainting models to video domains, this paper proposes a zero-shot video inpainting framework that reuses arbitrary 2D image diffusion inpainting models without fine-tuning. The method comprises two core innovations: (1) a hierarchical token fusion strategy that enforces inter-frame semantic alignment in the latent space; and (2) a joint optical-flow-guided and feature nearest-neighbor matching mechanism to enhance motion modeling robustness. Crucially, the approach eliminates the need for retraining across diverse degradation types—including 8× super-resolution and Gaussian noise with σ=75—achieving superior performance over fully supervised methods under extreme degradations. Moreover, it demonstrates significant cross-dataset generalization capability, validating its effectiveness beyond domain-specific training.

Technology Category

Application Category

📝 Abstract
This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models. Traditional video restoration methods often need retraining for different settings and struggle with limited generalization across various degradation types and datasets. Our approach uses a hierarchical token merging strategy for keyframes and local frames, combined with a hybrid correspondence mechanism that blends optical flow and feature-based nearest neighbor matching (latent merging). We show that our method not only achieves top performance in zero-shot video restoration but also significantly surpasses trained models in generalization across diverse datasets and extreme degradations (8$ imes$ super-resolution and high-standard deviation video denoising). We present evidence through quantitative metrics and visual comparisons on various challenging datasets. Additionally, our technique works with any 2D restoration diffusion model, offering a versatile and powerful tool for video enhancement tasks without extensive retraining. This research leads to more efficient and widely applicable video restoration technologies, supporting advancements in fields that require high-quality video output. See our project page for video results and source code at https://jimmycv07.github.io/DiffIR2VR_web/.
Problem

Research questions and friction points this paper is trying to address.

Enables image diffusion models for video restoration without retraining
Addresses temporal inconsistencies in video restoration with hierarchical warping
Achieves consistent high-quality results across diverse degradation conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical latent warping for temporal consistency
Hybrid token merging with flow and features
Zero-shot video restoration without retraining
🔎 Similar Papers
No similar papers found.