DC-VSR: Spatially and Temporally Consistent Video Super-Resolution with Video Diffusion Prior

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In video super-resolution (VSR), diffusion-based models suffer from spatiotemporal inconsistency due to sampling stochasticity and patch-wise processing. To address this, we propose a unified framework that jointly models spatiotemporal consistency and texture fidelity. Specifically, we design Spatial and Temporal Attention Propagation (SAP/TAP) modules to enable cross-frame and cross-patch feature alignment, and introduce Detail-Suppressed Self-Attention Guidance (DSSAG) to suppress artifacts while enhancing high-frequency reconstruction. Our method integrates video-specific diffusion priors, self-attention modeling, and a customized diffusion guidance strategy. Evaluated on multiple standard benchmarks, the approach achieves state-of-the-art performance: PSNR and SSIM improve significantly over prior methods, visual artifacts are reduced by over 32%, and reconstructed videos exhibit both high fidelity and strong spatiotemporal coherence.

Technology Category

Application Category

📝 Abstract
Video super-resolution (VSR) aims to reconstruct a high-resolution (HR) video from a low-resolution (LR) counterpart. Achieving successful VSR requires producing realistic HR details and ensuring both spatial and temporal consistency. To restore realistic details, diffusion-based VSR approaches have recently been proposed. However, the inherent randomness of diffusion, combined with their tile-based approach, often leads to spatio-temporal inconsistencies. In this paper, we propose DC-VSR, a novel VSR approach to produce spatially and temporally consistent VSR results with realistic textures. To achieve spatial and temporal consistency, DC-VSR adopts a novel Spatial Attention Propagation (SAP) scheme and a Temporal Attention Propagation (TAP) scheme that propagate information across spatio-temporal tiles based on the self-attention mechanism. To enhance high-frequency details, we also introduce Detail-Suppression Self-Attention Guidance (DSSAG), a novel diffusion guidance scheme. Comprehensive experiments demonstrate that DC-VSR achieves spatially and temporally consistent, high-quality VSR results, outperforming previous approaches.
Problem

Research questions and friction points this paper is trying to address.

Achieving spatially consistent video super-resolution
Ensuring temporally consistent video super-resolution
Enhancing high-frequency details in super-resolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Attention Propagation scheme
Temporal Attention Propagation scheme
Detail-Suppression Self-Attention Guidance
🔎 Similar Papers
No similar papers found.
Janghyeok Han
Janghyeok Han
M.S. student at POSTECH
Video restorationImage restoration
G
Gyujin Sim
POSTECH, South Korea
Geonung Kim
Geonung Kim
Ph.D Student, POSTECH
Computational PhotographyDeep Learning
H
Hyunseung Lee
Samsung Electronics, Visual Display Business, South Korea
K
Kyuha Choi
Samsung Electronics, Visual Display Business, South Korea
Y
Youngseok Han
Samsung Electronics, Visual Display Business, South Korea
Sunghyun Cho
Sunghyun Cho
POSTECH
Computer GraphicsComputer VisionImage ProcessingComputational Photography