ProGVC: Progressive-based Generative Video Compression via Auto-Regressive Context Modeling

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing learned video compression methods struggle to natively support variable bitrates and progressive transmission, and often suffer from insufficient coupling between the generative module and entropy coding, limiting overall compression efficiency. To address these limitations, this work proposes ProGVC, a unified framework that, for the first time, integrates progressive transmission, efficient entropy coding, and generative detail reconstruction within a single codec. The approach introduces a multi-scale residual token representation and leverages a Transformer-based autoregressive context model to enable coarse-to-fine progressive encoding, while jointly optimizing the generative prior and entropy coding. Experimental results demonstrate that ProGVC significantly improves perceptual quality at low bitrates and offers flexible bitrate adaptation alongside practical scalability.

Technology Category

Application Category

📝 Abstract
Perceptual video compression leverages generative priors to reconstruct realistic textures and motions at low bitrates. However, existing perceptual codecs often lack native support for variable bitrate and progressive delivery, and their generative modules are weakly coupled with entropy coding, limiting bitrate reduction. Inspired by the next-scale prediction in the Visual Auto-Regressive (VAR) models, we propose ProGVC, a Progressive-based Generative Video Compression framework that unifies progressive transmission, efficient entropy coding, and detail synthesis within a single codec. ProGVC encodes videos into hierarchical multi-scale residual token maps, enabling flexible rate adaptation by transmitting a coarse-to-fine subset of scales in a progressive manner. A Transformer-based multi-scale autoregressive context model estimates token probabilities, utilized both for efficient entropy coding of the transmitted tokens and for predicting truncated fine-scale tokens at the decoder to restore perceptual details. Extensive experiments demonstrate that as a new coding paradigm, ProGVC delivers promising perceptual compression performance at low bitrates while offering practical scalability at the same time.
Problem

Research questions and friction points this paper is trying to address.

perceptual video compression
variable bitrate
progressive delivery
entropy coding
generative priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Video Compression
Generative Video Coding
Auto-Regressive Context Modeling
Multi-Scale Token Representation
Perceptual Quality