Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding challenge in ultra-low-bitrate image compression of simultaneously achieving high fidelity and perceptual quality. To this end, we propose a novel generative decoding paradigm that formulates image reconstruction as a virtual temporal evolution from a semantics-preserving anchor frame to the target image. Leveraging a pretrained video diffusion model (VDM) as a temporal prior, our approach introduces an anchor-guided next-frame prediction mechanism to enhance both semantic consistency and visual realism. Evaluated on the CLIC2020 test set, the proposed method achieves over 50% bitrate savings compared to DiffC while accelerating decoding by up to fivefold, substantially outperforming current state-of-the-art techniques.

Technology Category

Application Category

📝 Abstract
We present a novel paradigm for ultra-low-bitrate image compression (ULB-IC) that exploits the ``temporal'' evolution in generative image compression. Specifically, we define an explicit intermediate state during decoding: a compact anchor frame, which preserves the scene geometry and semantic layout while discarding high-frequency details. We then reinterpret generative decoding as a virtual temporal transition from this anchor to the final reconstructed image.To model this progression, we leverage a pretrained video diffusion model (VDM) as temporal priors: the anchor frame serves as the initial frame and the original image as the target frame, transforming the decoding process into a next-frame prediction task.In contrast to image diffusion-based ULB-IC models, our decoding proceeds from a visible, semantically faithful anchor, which improves both fidelity and realism for perceptual image compression. Extensive experiments demonstrate that our method achieves superior objective and subjective performance. On the CLIC2020 test set, our method achieves over \textbf{50\% bitrate savings} across LPIPS, DISTS, FID, and KID compared to DiffC, while also delivering a significant decoding speedup of up to $\times$5. Code will be released later.
Problem

Research questions and friction points this paper is trying to address.

ultra-low-bitrate image compression
perceptual image compression
generative image compression
next-frame prediction
video diffusion priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

ultra-low-bitrate image compression
video diffusion model
anchor frame
next-frame prediction
generative decoding
🔎 Similar Papers
No similar papers found.