🤖 AI Summary
Existing diffusion-based video super-resolution (VSR) methods suffer from excessive computational overhead and heavy learning burden due to redundant reuse of low-quality video content. To address this, we propose OASIS, an efficient one-stage diffusion model for VSR. Our key contributions are: (1) an attention specialization routing mechanism that dynamically assigns distinct functional roles to individual attention heads based on spatiotemporal behavior patterns—preserving pretrained knowledge while significantly reducing redundant computation; and (2) a progressive denoising training strategy that enhances robustness to complex, realistic degradations. Extensive experiments on both synthetic and real-world benchmarks demonstrate that OASIS achieves state-of-the-art (SOTA) performance in terms of both accuracy and efficiency. Notably, it attains up to 6.2× inference speedup over SeedVR2, establishing a new trade-off frontier between speed and reconstruction fidelity.
📝 Abstract
Diffusion models have recently shown promising results for video super-resolution (VSR). However, directly adapting generative diffusion models to VSR can result in redundancy, since low-quality videos already preserve substantial content information. Such redundancy leads to increased computational overhead and learning burden, as the model performs superfluous operations and must learn to filter out irrelevant information. To address this problem, we propose OASIS, an efficient $ extbf{o}$ne-step diffusion model with $ extbf{a}$ttention $ extbf{s}$pecialization for real-world v$ extbf{i}$deo $ extbf{s}$uper-resolution. OASIS incorporates an attention specialization routing that assigns attention heads to different patterns according to their intrinsic behaviors. This routing mitigates redundancy while effectively preserving pretrained knowledge, allowing diffusion models to better adapt to VSR and achieve stronger performance. Moreover, we propose a simple yet effective progressive training strategy, which starts with temporally consistent degradations and then shifts to inconsistent settings. This strategy facilitates learning under complex degradations. Extensive experiments demonstrate that OASIS achieves state-of-the-art performance on both synthetic and real-world datasets. OASIS also provides superior inference speed, offering a $ extbf{6.2$ imes$}$ speedup over one-step diffusion baselines such as SeedVR2. The code will be available at href{https://github.com/jp-guo/OASIS}{https://github.com/jp-guo/OASIS}.