π€ AI Summary
Real-world video super-resolution (VSR) faces three key challenges: complex and unknown degradations, temporal inconsistency, and blurred fine details. To address these, we propose a diffusion-enhanced framework that explicitly decouples appearance reconstruction from motion modelingβthe first such approach in diffusion-based VSR. Our method introduces a motion-aligned bidirectional sampling strategy, synergistically leveraging the detail-synthesis capability of image super-resolution diffusion models and the temporal priors inherent in video diffusion models. We integrate Stable Video Diffusion with ControlNet: a reference high-resolution image guides appearance enhancement, while a video-conditioned ControlNet enforces motion consistency across frames. Extensive experiments demonstrate state-of-the-art performance on both real-world and AIGC-generated video benchmarks, achieving significant improvements in long-sequence detail fidelity and temporal coherence.
π Abstract
Real-world video super-resolution (VSR) presents significant challenges due to complex and unpredictable degradations. Although some recent methods utilize image diffusion models for VSR and have shown improved detail generation capabilities, they still struggle to produce temporally consistent frames. We attempt to use Stable Video Diffusion (SVD) combined with ControlNet to address this issue. However, due to the intrinsic image-animation characteristics of SVD, it is challenging to generate fine details using only low-quality videos. To tackle this problem, we propose DAM-VSR, an appearance and motion disentanglement framework for VSR. This framework disentangles VSR into appearance enhancement and motion control problems. Specifically, appearance enhancement is achieved through reference image super-resolution, while motion control is achieved through video ControlNet. This disentanglement fully leverages the generative prior of video diffusion models and the detail generation capabilities of image super-resolution models. Furthermore, equipped with the proposed motion-aligned bidirectional sampling strategy, DAM-VSR can conduct VSR on longer input videos. DAM-VSR achieves state-of-the-art performance on real-world data and AIGC data, demonstrating its powerful detail generation capabilities.