DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution

πŸ“… 2025-07-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

199K/year
πŸ€– AI Summary
Real-world video super-resolution (VSR) faces three key challenges: complex and unknown degradations, temporal inconsistency, and blurred fine details. To address these, we propose a diffusion-enhanced framework that explicitly decouples appearance reconstruction from motion modelingβ€”the first such approach in diffusion-based VSR. Our method introduces a motion-aligned bidirectional sampling strategy, synergistically leveraging the detail-synthesis capability of image super-resolution diffusion models and the temporal priors inherent in video diffusion models. We integrate Stable Video Diffusion with ControlNet: a reference high-resolution image guides appearance enhancement, while a video-conditioned ControlNet enforces motion consistency across frames. Extensive experiments demonstrate state-of-the-art performance on both real-world and AIGC-generated video benchmarks, achieving significant improvements in long-sequence detail fidelity and temporal coherence.

Technology Category

Application Category

πŸ“ Abstract
Real-world video super-resolution (VSR) presents significant challenges due to complex and unpredictable degradations. Although some recent methods utilize image diffusion models for VSR and have shown improved detail generation capabilities, they still struggle to produce temporally consistent frames. We attempt to use Stable Video Diffusion (SVD) combined with ControlNet to address this issue. However, due to the intrinsic image-animation characteristics of SVD, it is challenging to generate fine details using only low-quality videos. To tackle this problem, we propose DAM-VSR, an appearance and motion disentanglement framework for VSR. This framework disentangles VSR into appearance enhancement and motion control problems. Specifically, appearance enhancement is achieved through reference image super-resolution, while motion control is achieved through video ControlNet. This disentanglement fully leverages the generative prior of video diffusion models and the detail generation capabilities of image super-resolution models. Furthermore, equipped with the proposed motion-aligned bidirectional sampling strategy, DAM-VSR can conduct VSR on longer input videos. DAM-VSR achieves state-of-the-art performance on real-world data and AIGC data, demonstrating its powerful detail generation capabilities.
Problem

Research questions and friction points this paper is trying to address.

Addressing temporal inconsistency in video super-resolution frames
Enhancing fine details using low-quality video inputs
Disentangling appearance and motion for improved VSR performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles VSR into appearance and motion
Uses ControlNet for motion control
Employs motion-aligned bidirectional sampling
πŸ”Ž Similar Papers
No similar papers found.