Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based video super-resolution (VSR) methods suffer from excessive computational overhead and heavy learning burden due to redundant reuse of low-quality video content. To address this, we propose OASIS, an efficient one-stage diffusion model for VSR. Our key contributions are: (1) an attention specialization routing mechanism that dynamically assigns distinct functional roles to individual attention heads based on spatiotemporal behavior patterns—preserving pretrained knowledge while significantly reducing redundant computation; and (2) a progressive denoising training strategy that enhances robustness to complex, realistic degradations. Extensive experiments on both synthetic and real-world benchmarks demonstrate that OASIS achieves state-of-the-art (SOTA) performance in terms of both accuracy and efficiency. Notably, it attains up to 6.2× inference speedup over SeedVR2, establishing a new trade-off frontier between speed and reconstruction fidelity.

Technology Category

Application Category

📝 Abstract
Diffusion models have recently shown promising results for video super-resolution (VSR). However, directly adapting generative diffusion models to VSR can result in redundancy, since low-quality videos already preserve substantial content information. Such redundancy leads to increased computational overhead and learning burden, as the model performs superfluous operations and must learn to filter out irrelevant information. To address this problem, we propose OASIS, an efficient $ extbf{o}$ne-step diffusion model with $ extbf{a}$ttention $ extbf{s}$pecialization for real-world v$ extbf{i}$deo $ extbf{s}$uper-resolution. OASIS incorporates an attention specialization routing that assigns attention heads to different patterns according to their intrinsic behaviors. This routing mitigates redundancy while effectively preserving pretrained knowledge, allowing diffusion models to better adapt to VSR and achieve stronger performance. Moreover, we propose a simple yet effective progressive training strategy, which starts with temporally consistent degradations and then shifts to inconsistent settings. This strategy facilitates learning under complex degradations. Extensive experiments demonstrate that OASIS achieves state-of-the-art performance on both synthetic and real-world datasets. OASIS also provides superior inference speed, offering a $ extbf{6.2$ imes$}$ speedup over one-step diffusion baselines such as SeedVR2. The code will be available at href{https://github.com/jp-guo/OASIS}{https://github.com/jp-guo/OASIS}.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational redundancy in video super-resolution diffusion models
Optimizing attention mechanisms for efficient video quality enhancement
Improving inference speed while maintaining state-of-the-art performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step diffusion model with attention specialization routing
Progressive training strategy for complex degradation learning
Assigns attention heads to patterns reducing computational redundancy
🔎 Similar Papers
No similar papers found.
Jinpei Guo
Jinpei Guo
Carnegie Mellon University
Deep LearningCombinatorial OptimizationGenerative AI
Y
Yifei Ji
Shanghai Jiao Tong University
Z
Zheng Chen
Shanghai Jiao Tong University
Y
Yufei Wang
Snap Inc.
Sizhuo Ma
Sizhuo Ma
Snap Inc.
computer visioncomputational imaging
Y
Yong Guo
South China University of Technology
Y
Yulun Zhang
Shanghai Jiao Tong University
J
Jian Wang
Snap Inc.