Asymmetric VAE for One-Step Video Super-Resolution Acceleration

πŸ“… 2025-09-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing video super-resolution (VSR) diffusion models achieve single-step sampling but remain bottlenecked by costly VAE decoding during inference. To address this, we propose FastVSRβ€”a computationally efficient VSR framework. Methodologically, it introduces: (1) an asymmetric f16 VAE with high spatial compression ratio, drastically reducing latent dimensionality; (2) a lower-bound-guided training strategy to enhance VAE stability and accelerate convergence; and (3) a parameter-free upsampling scheme combining pixel rearrangement and channel duplication, eliminating additional computational overhead. Experiments demonstrate that FastVSR achieves 111.9Γ— speedup over multi-step diffusion baselines and 3.92Γ— acceleration over state-of-the-art single-step methods, while preserving reconstruction quality. By significantly lowering inference latency and memory footprint without sacrificing fidelity, FastVSR establishes a new paradigm for efficient, practical VSR deployment.

Technology Category

Application Category

πŸ“ Abstract
Diffusion models have significant advantages in the field of real-world video super-resolution and have demonstrated strong performance in past research. In recent diffusion-based video super-resolution (VSR) models, the number of sampling steps has been reduced to just one, yet there remains significant room for further optimization in inference efficiency. In this paper, we propose FastVSR, which achieves substantial reductions in computational cost by implementing a high compression VAE (spatial compression ratio of 16, denoted as f16). We design the structure of the f16 VAE and introduce a stable training framework. We employ pixel shuffle and channel replication to achieve additional upsampling. Furthermore, we propose a lower-bound-guided training strategy, which introduces a simpler training objective as a lower bound for the VAE's performance. It makes the training process more stable and easier to converge. Experimental results show that FastVSR achieves speedups of 111.9 times compared to multi-step models and 3.92 times compared to existing one-step models. We will release code and models at https://github.com/JianzeLi-114/FastVSR.
Problem

Research questions and friction points this paper is trying to address.

Accelerating one-step video super-resolution with asymmetric VAE
Reducing computational cost through high compression VAE design
Improving training stability with lower-bound-guided optimization strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

High compression VAE with f16 spatial ratio
Pixel shuffle and channel replication upsampling
Lower-bound-guided training for stable convergence
πŸ”Ž Similar Papers
No similar papers found.
Jianze Li
Jianze Li
Shanghai Jiao Tong University
Computer VisionImage Restoration
Y
Yong Guo
South China University of Technology
Y
Yulun Zhang
Shanghai Jiao Tong University
X
Xiaokang Yang
Shanghai Jiao Tong University