InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution

πŸ“… 2026-03-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges of temporal inconsistency and high computational cost in existing video super-resolution diffusion models. The authors propose InstaVSR, a lightweight framework that uniquely integrates pruned single-step diffusion, optical flow–guided temporal regularization, and dual-space (latent and pixel) adversarial learning. This synergistic design significantly enhances both temporal coherence and perceptual quality while markedly improving computational efficiency. Notably, InstaVSR operates within only 7 GB of GPU memory on an RTX 4090 and processes 30 frames of 2KΓ—2K video in under one minute, achieving an effective balance among computational efficiency, inference stability, and visual fidelity.
πŸ“ Abstract
Video super-resolution (VSR) seeks to reconstruct high-resolution frames from low-resolution inputs. While diffusion-based methods have substantially improved perceptual quality, extending them to video remains challenging for two reasons: strong generative priors can introduce temporal instability, and multi-frame diffusion pipelines are often too expensive for practical deployment. To address both challenges simultaneously, we propose InstaVSR, a lightweight diffusion framework for efficient video super-resolution. InstaVSR combines three ingredients: (1) a pruned one-step diffusion backbone that removes several costly components from conventional diffusion-based VSR pipelines, (2) recurrent training with flow-guided temporal regularization to improve frame-to-frame stability, and (3) dual-space adversarial learning in latent and pixel spaces to preserve perceptual quality after backbone simplification. On an NVIDIA RTX 4090, InstaVSR processes a 30-frame video at 2K$\times$2K resolution in under one minute with only 7 GB of memory usage, substantially reducing the computational cost compared to existing diffusion-based methods while maintaining favorable perceptual quality with significantly smoother temporal transitions.
Problem

Research questions and friction points this paper is trying to address.

video super-resolution
temporal consistency
diffusion models
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based video super-resolution
temporal consistency
one-step diffusion
flow-guided recurrent training
dual-space adversarial learning
πŸ”Ž Similar Papers
No similar papers found.