Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

📅 2024-07-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address temporal flickering and inconsistent restoration caused by the lack of explicit temporal modeling when directly applying pre-trained image diffusion models to video restoration, this paper proposes the first zero-shot video restoration framework that transfers image diffusion models to video tasks without requiring any video training data. Methodologically, we introduce four key innovations: (1) short- and long-range temporal self-attention, (2) a temporal consistency guidance loss, (3) a spatio-temporal joint noise-sharing mechanism, and (4) an early-stopping sampling strategy. Experiments demonstrate that our approach significantly suppresses temporal artifacts while enhancing structural and textural consistency across frames. It achieves state-of-the-art performance on multiple zero-shot video degradation tasks—including deblurring, super-resolution, and denoising—without task-specific fine-tuning. Moreover, the framework supports plug-and-play integration with existing image diffusion models.

Technology Category

Application Category

📝 Abstract
Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various tasks of image restoration and enhancement. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on the pre-trained image diffusion model. By replacing the spatial self-attention layer with the proposed short-long-range (SLR) temporal attention layer, the pre-trained image diffusion model can take advantage of the temporal correlation between frames. We further propose temporal consistency guidance, spatial-temporal noise sharing, and an early stopping sampling strategy to improve temporally consistent sampling. Our method is a plug-and-play module that can be inserted into any diffusion-based image restoration or enhancement methods to further improve their performance. Experimental results demonstrate the superiority of our proposed method. Our code is available at https://github.com/cao-cong/ZVRD.
Problem

Research questions and friction points this paper is trying to address.

Flickering Issue
Inter-frame Relationship
Video Restoration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained Image Restoration
Temporal Consistency
Temporal Attention Layer
C
Cong Cao
School of Electrical and Information Engineering, Tianjin University, Tianjin, China
H
Huanjing Yue
School of Electrical and Information Engineering, Tianjin University, Tianjin, China
X
Xin Liu
School of Electrical and Information Engineering, Tianjin University, Tianjin, China; Computer Vision and Pattern Recognition Laboratory, School of Engineering Science, Lappeenranta-Lahti University of Technology LUT, Lappeenranta, Finland
J
Jingyu Yang
School of Electrical and Information Engineering, Tianjin University, Tianjin, China