Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the high initial latency and low throughput in zero-shot video inverse problem solving caused by holistic reconstruction. To overcome these limitations, the authors propose AVIS, a novel framework that introduces autoregressive diffusion to this task for the first time. AVIS enables streaming, block-wise video restoration, employs measurement-consistent initialization to reduce sampling steps, and integrates a VAE acceleration strategy to significantly enhance efficiency. Furthermore, the proposed AVIS Flash variant enforces consistency only on the initial block, achieving an effective balance between reconstruction quality and speed. Experiments demonstrate that AVIS reduces initial latency from 114 seconds to 4 seconds and increases throughput from 0.71 to 1.18 FPS. Notably, AVIS Flash achieves 5.91 FPS on a single RTX 4090 GPU while maintaining competitive reconstruction fidelity.
📝 Abstract
Diffusion models provide powerful priors for zero-shot video inverse problems, but their real-time deployment is hindered by two inefficiencies: high initial latency caused by holistic video restoration, and low throughput resulting from multiple VAE passes to enforce measurement consistency in pixel space. To overcome these limitations, we propose Autoregressive Video Inverse problem Solver (AVIS). The AVIS framework leverages autoregressive video diffusion models to restore videos in a streaming manner, naturally eliminating latency bottlenecks. Specifically, AVIS initializes reverse diffusion with a measurement-consistent estimate, reducing the required sampling steps. Compared to leading non-autoregressive solvers, AVIS drastically reduces initial latency from 114s to 4s and increases throughput from 0.71 to 1.18 FPS while achieving superior restoration quality. We further introduce a highly accelerated variant, dubbed AVIS Flash, that enforces measurement consistency solely on the first chunk. AVIS Flash substantially boosts throughput to 5.91 FPS on a single RTX 4090 GPU while maintaining competitive performance and achieving a favorable efficiency-performance trade-off, paving the way toward real-time deployment.
Problem

Research questions and friction points this paper is trying to address.

video inverse problems
diffusion models
real-time deployment
latency
throughput
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive diffusion
video inverse problems
streaming restoration
measurement consistency
real-time video processing
🔎 Similar Papers