Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

📅 2024-07-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address temporal flickering and inconsistent restoration caused by the lack of explicit temporal modeling when directly applying pre-trained image diffusion models to video restoration, this paper proposes the first zero-shot video restoration framework that transfers image diffusion models to video tasks without requiring any video training data. Methodologically, we introduce four key innovations: (1) short- and long-range temporal self-attention, (2) a temporal consistency guidance loss, (3) a spatio-temporal joint noise-sharing mechanism, and (4) an early-stopping sampling strategy. Experiments demonstrate that our approach significantly suppresses temporal artifacts while enhancing structural and textural consistency across frames. It achieves state-of-the-art performance on multiple zero-shot video degradation tasks—including deblurring, super-resolution, and denoising—without task-specific fine-tuning. Moreover, the framework supports plug-and-play integration with existing image diffusion models.

Technology Category

Application Category

📝 Abstract

Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various tasks of image restoration and enhancement. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on the pre-trained image diffusion model. By replacing the spatial self-attention layer with the proposed short-long-range (SLR) temporal attention layer, the pre-trained image diffusion model can take advantage of the temporal correlation between frames. We further propose temporal consistency guidance, spatial-temporal noise sharing, and an early stopping sampling strategy to improve temporally consistent sampling. Our method is a plug-and-play module that can be inserted into any diffusion-based image restoration or enhancement methods to further improve their performance. Experimental results demonstrate the superiority of our proposed method. Our code is available at https://github.com/cao-cong/ZVRD.

Problem

Research questions and friction points this paper is trying to address.

Flickering Issue

Inter-frame Relationship

Video Restoration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained Image Restoration

Temporal Consistency

Temporal Attention Layer

🔎 Similar Papers

DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

2024-07-01arXiv.orgCitations: 4

Apple

Cupertino, United States of America

Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

AI Research Scientist, Computer Vision - Facebook Video Intelligence