Temporal-Consistent Video Restoration with Pre-trained Diffusion Models

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses quality degradation and temporal inconsistency in zero-shot video restoration, arising from approximation errors in the reverse diffusion process of pretrained diffusion models. We propose a maximum a posteriori (MAP) optimization framework operating in a seed space: first, seed-space parameterization eliminates approximation errors inherent in the reverse process; second, a two-level temporal consistency mechanism is introduced—semantic-level consistency is enforced via clustering structure in the seed space to model inter-frame semantic coherence, while pixel-level alignment is achieved through optical flow estimation coupled with iterative refinement for progressive registration. Crucially, our method requires no fine-tuning of the diffusion model, significantly reducing computational overhead for 3D video processing. Extensive experiments on multiple VR restoration tasks demonstrate state-of-the-art performance, achieving superior PSNR, SSIM, and user study scores compared to existing methods.

Technology Category

Application Category

📝 Abstract
Video restoration (VR) aims to recover high-quality videos from degraded ones. Although recent zero-shot VR methods using pre-trained diffusion models (DMs) show good promise, they suffer from approximation errors during reverse diffusion and insufficient temporal consistency. Moreover, dealing with 3D video data, VR is inherently computationally intensive. In this paper, we advocate viewing the reverse process in DMs as a function and present a novel Maximum a Posterior (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors. We also introduce strategies to promote bilevel temporal consistency: semantic consistency by leveraging clustering structures in the seed space, and pixel-level consistency by progressive warping with optical flow refinements. Extensive experiments on multiple virtual reality tasks demonstrate superior visual quality and temporal consistency achieved by our method compared to the state-of-the-art.
Problem

Research questions and friction points this paper is trying to address.

Address approximation errors in reverse diffusion for video restoration.
Enhance temporal consistency in video restoration using pre-trained diffusion models.
Reduce computational intensity in processing 3D video data for restoration.
Innovation

Methods, ideas, or system contributions that make the work stand out.

MAP framework parameterizes video frames directly
Clustering ensures semantic temporal consistency
Optical flow refinements enhance pixel-level consistency
🔎 Similar Papers
No similar papers found.