RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution

πŸ“… 2025-07-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses three key challenges in real-world 4K video super-resolution (VSR): inconsistent temporal modeling, insufficient high-frequency detail recovery, and the absence of high-quality evaluation benchmarks. Methodologically, we propose a detail-enhanced diffusion model featuring: (i) a consistency-preserving ControlNet architecture to strengthen inter-frame temporal modeling; (ii) a novel high-frequency correction diffusion loss coupled with higher-order moment loss, integrated with wavelet decomposition, HOG feature constraints, and spatiotemporal guidance; and (iii) RealisVideo-4Kβ€”the first publicly available 4K VSR benchmark. Leveraging the Wan2.1 video diffusion framework and a lightweight training strategy, our method achieves efficient training using only 5–25% of typical data volume. Extensive experiments demonstrate state-of-the-art performance across multiple mainstream VSR datasets, particularly excelling in 4K upscaling and texture restoration, thereby advancing the practical deployment of real-scenario video restoration.

Technology Category

Application Category

πŸ“ Abstract
Video Super-Resolution (VSR) has achieved significant progress through diffusion models, effectively addressing the over-smoothing issues inherent in GAN-based methods. Despite recent advances, three critical challenges persist in VSR community: 1) Inconsistent modeling of temporal dynamics in foundational models; 2) limited high-frequency detail recovery under complex real-world degradations; and 3) insufficient evaluation of detail enhancement and 4K super-resolution, as current methods primarily rely on 720P datasets with inadequate details. To address these challenges, we propose RealisVSR, a high-frequency detail-enhanced video diffusion model with three core innovations: 1) Consistency Preserved ControlNet (CPC) architecture integrated with the Wan2.1 video diffusion to model the smooth and complex motions and suppress artifacts; 2) High-Frequency Rectified Diffusion Loss (HR-Loss) combining wavelet decomposition and HOG feature constraints for texture restoration; 3) RealisVideo-4K, the first public 4K VSR benchmark containing 1,000 high-definition video-text pairs. Leveraging the advanced spatio-temporal guidance of Wan2.1, our method requires only 5-25% of the training data volume compared to existing approaches. Extensive experiments on VSR benchmarks (REDS, SPMCS, UDM10, YouTube-HQ, VideoLQ, RealisVideo-720P) demonstrate our superiority, particularly in ultra-high-resolution scenarios.
Problem

Research questions and friction points this paper is trying to address.

Inconsistent temporal dynamics modeling in video super-resolution
Limited high-frequency detail recovery in real-world degradations
Insufficient evaluation of 4K super-resolution and detail enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Consistency Preserved ControlNet for motion modeling
High-Frequency Rectified Diffusion Loss for texture
RealisVideo-4K benchmark for 4K VSR evaluation
πŸ”Ž Similar Papers
No similar papers found.
W
Weisong Zhao
Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences, DAMO Academy
Jingkai Zhou
Jingkai Zhou
Independent Researcher
Computer vision
X
Xiangyu Zhu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Science, School of Artificial Intelligence, University of Chinese Academy of Sciences
Weihua Chen
Weihua Chen
Alibaba DAMO Academy, previously NLPR, CASIA
Computer Vision
X
Xiao-Yu Zhang
Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences
Zhen Lei
Zhen Lei
Associate Professor, OSCO Research Chair in Off-site Construction
Offsite ConstructionConstruction Engineering and Management
F
Fan Wang
DAMO Academy