DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution

πŸ“… 2025-07-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Real-world video super-resolution (VSR) faces three key challenges: complex and unknown degradations, temporal inconsistency, and blurred fine details. To address these, we propose a diffusion-enhanced framework that explicitly decouples appearance reconstruction from motion modelingβ€”the first such approach in diffusion-based VSR. Our method introduces a motion-aligned bidirectional sampling strategy, synergistically leveraging the detail-synthesis capability of image super-resolution diffusion models and the temporal priors inherent in video diffusion models. We integrate Stable Video Diffusion with ControlNet: a reference high-resolution image guides appearance enhancement, while a video-conditioned ControlNet enforces motion consistency across frames. Extensive experiments demonstrate state-of-the-art performance on both real-world and AIGC-generated video benchmarks, achieving significant improvements in long-sequence detail fidelity and temporal coherence.

Technology Category

Application Category

πŸ“ Abstract
Real-world video super-resolution (VSR) presents significant challenges due to complex and unpredictable degradations. Although some recent methods utilize image diffusion models for VSR and have shown improved detail generation capabilities, they still struggle to produce temporally consistent frames. We attempt to use Stable Video Diffusion (SVD) combined with ControlNet to address this issue. However, due to the intrinsic image-animation characteristics of SVD, it is challenging to generate fine details using only low-quality videos. To tackle this problem, we propose DAM-VSR, an appearance and motion disentanglement framework for VSR. This framework disentangles VSR into appearance enhancement and motion control problems. Specifically, appearance enhancement is achieved through reference image super-resolution, while motion control is achieved through video ControlNet. This disentanglement fully leverages the generative prior of video diffusion models and the detail generation capabilities of image super-resolution models. Furthermore, equipped with the proposed motion-aligned bidirectional sampling strategy, DAM-VSR can conduct VSR on longer input videos. DAM-VSR achieves state-of-the-art performance on real-world data and AIGC data, demonstrating its powerful detail generation capabilities.
Problem

Research questions and friction points this paper is trying to address.

Addressing temporal inconsistency in video super-resolution frames
Enhancing fine details using low-quality video inputs
Disentangling appearance and motion for improved VSR performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles VSR into appearance and motion
Uses ControlNet for motion control
Employs motion-aligned bidirectional sampling
πŸ”Ž Similar Papers
No similar papers found.
Zhe Kong
Zhe Kong
Sun Yat-sen University
Generative modelImage and video synthesis
Le Li
Le Li
Tianjin University, China
Y
Yong Zhang
Meituan, China
F
Feng Gao
Meituan, China
S
Shaoshu Yang
School of Artificial Intelligence, University of Chinese Academy of Sciences, China
T
Tao Wang
Nanjing University, China
Kaihao Zhang
Kaihao Zhang
Australian National University
Deep learningComputer vision
Z
Zhuoliang Kang
Meituan, China
Xiaoming Wei
Xiaoming Wei
Meituan
computer visionmachine learning
G
Guanying Chen
Shenzhen Campus of Sun Yat-sen University, China
Wenhan Luo
Wenhan Luo
Associate Professor, HKUST
Creative AIGenerative ModelComputer VisionMachine Learning