Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

📅 2025-12-29

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing diffusion-based video super-resolution (VSR) methods suffer from high latency and poor streamability due to reliance on future frames and multi-step denoising. To address this, we propose the first causal, streamable diffusion VSR framework. Our approach introduces three key innovations: (1) a causally constrained diffusion model that enables online inference using only past frames; (2) a four-stage distilled denoiser integrated with an autoregressive temporal guidance (ARTG) module to enhance temporal consistency; and (3) a lightweight temporal-aware decoder incorporating a temporal propagation module (TPM), balancing efficiency and reconstruction quality. On an RTX 4090 GPU, our method processes a single 720p frame in just 0.328 seconds—reducing initial latency from over 4600 seconds to 0.328 seconds—and achieves a 130× speedup over state-of-the-art online VSR methods, alongside a 0.095 improvement in LPIPS.

Technology Category

Application Category

📝 Abstract

Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, it combines a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module that injects motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) that enhances detail and temporal coherence. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX4090 GPU and significantly outperforms prior diffusion-based methods. Compared with the online SOTA TMP, it boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130x. Stream-DiffVSR achieves the lowest latency reported for diffusion-based VSR, reducing initial delay from over 4600 seconds to 0.328 seconds, thereby making it the first diffusion VSR method suitable for low-latency online deployment. Project page: https://jamichss.github.io/stream-diffvsr-project-page/

Problem

Research questions and friction points this paper is trying to address.

Enables real-time video super-resolution using only past frames

Reduces latency for diffusion-based video enhancement by over 130 times

Achieves high perceptual quality with efficient online processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal diffusion framework uses only past frames for online processing

Four-step distilled denoiser and auto-regressive guidance accelerate inference

Lightweight temporal decoder enhances detail and coherence efficiently

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

Video Algorithms Intern, Video Coding (Gaussian Splatting), Fall 2026

Netflix

The overall market range for Netflix Internships is typically $40/hour - $110/hour.

Los Gatos, CA, USA / Los Angeles, CA, USA

Authors to Follow