DC-VSR: Spatially and Temporally Consistent Video Super-Resolution with Video Diffusion Prior

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In video super-resolution (VSR), diffusion-based models suffer from spatiotemporal inconsistency due to sampling stochasticity and patch-wise processing. To address this, we propose a unified framework that jointly models spatiotemporal consistency and texture fidelity. Specifically, we design Spatial and Temporal Attention Propagation (SAP/TAP) modules to enable cross-frame and cross-patch feature alignment, and introduce Detail-Suppressed Self-Attention Guidance (DSSAG) to suppress artifacts while enhancing high-frequency reconstruction. Our method integrates video-specific diffusion priors, self-attention modeling, and a customized diffusion guidance strategy. Evaluated on multiple standard benchmarks, the approach achieves state-of-the-art performance: PSNR and SSIM improve significantly over prior methods, visual artifacts are reduced by over 32%, and reconstructed videos exhibit both high fidelity and strong spatiotemporal coherence.

Technology Category

Application Category

📝 Abstract

Video super-resolution (VSR) aims to reconstruct a high-resolution (HR) video from a low-resolution (LR) counterpart. Achieving successful VSR requires producing realistic HR details and ensuring both spatial and temporal consistency. To restore realistic details, diffusion-based VSR approaches have recently been proposed. However, the inherent randomness of diffusion, combined with their tile-based approach, often leads to spatio-temporal inconsistencies. In this paper, we propose DC-VSR, a novel VSR approach to produce spatially and temporally consistent VSR results with realistic textures. To achieve spatial and temporal consistency, DC-VSR adopts a novel Spatial Attention Propagation (SAP) scheme and a Temporal Attention Propagation (TAP) scheme that propagate information across spatio-temporal tiles based on the self-attention mechanism. To enhance high-frequency details, we also introduce Detail-Suppression Self-Attention Guidance (DSSAG), a novel diffusion guidance scheme. Comprehensive experiments demonstrate that DC-VSR achieves spatially and temporally consistent, high-quality VSR results, outperforming previous approaches.

Problem

Research questions and friction points this paper is trying to address.

Achieving spatially consistent video super-resolution

Ensuring temporally consistent video super-resolution

Enhancing high-frequency details in super-resolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Attention Propagation scheme

Temporal Attention Propagation scheme

Detail-Suppression Self-Attention Guidance

🔎 Similar Papers

No similar papers found.

Authors to Follow