π€ AI Summary
This work addresses the challenges of high inference latency and temporal inconsistency in existing diffusion priorβbased video restoration methods when handling complex real-world degradations. The authors propose a single-image diffusion-driven, low-step video restoration framework that leverages a degradation-robust optical flow alignment module to obtain reliable temporal guidance. By integrating adversarial distillation to compress the diffusion sampling steps and introducing a co-optimization strategy to jointly enhance perceptual quality and temporal consistency, the method achieves state-of-the-art restoration performance while accelerating diffusion sampling by 12Γ compared to current approaches.
π Abstract
The integration of diffusion priors with temporal alignment has emerged as a transformative paradigm for video restoration, delivering fantastic perceptual quality, yet the practical deployment of such frameworks is severely constrained by prohibitive inference latency and temporal instability when confronted with complex real-world degradations. To address these limitations, we propose \textbf{D$^2$-VR}, a single-image diffusion-based video-restoration framework with low-step inference. To obtain precise temporal guidance under severe degradation, we first design a Degradation-Robust Flow Alignment (DRFA) module that leverages confidence-aware attention to filter unreliable motion cues. We then incorporate an adversarial distillation paradigm to compress the diffusion sampling trajectory into a rapid few-step regime. Finally, a synergistic optimization strategy is devised to harmonize perceptual quality with rigorous temporal consistency. Extensive experiments demonstrate that D$^2$-VR achieves state-of-the-art performance while accelerating the sampling process by \textbf{12$\times$}