Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the longstanding challenge in ultra-low-bitrate image compression of simultaneously achieving high fidelity and perceptual quality. To this end, we propose a novel generative decoding paradigm that formulates image reconstruction as a virtual temporal evolution from a semantics-preserving anchor frame to the target image. Leveraging a pretrained video diffusion model (VDM) as a temporal prior, our approach introduces an anchor-guided next-frame prediction mechanism to enhance both semantic consistency and visual realism. Evaluated on the CLIC2020 test set, the proposed method achieves over 50% bitrate savings compared to DiffC while accelerating decoding by up to fivefold, substantially outperforming current state-of-the-art techniques.

Technology Category

Application Category

📝 Abstract

We present a novel paradigm for ultra-low-bitrate image compression (ULB-IC) that exploits the ``temporal'' evolution in generative image compression. Specifically, we define an explicit intermediate state during decoding: a compact anchor frame, which preserves the scene geometry and semantic layout while discarding high-frequency details. We then reinterpret generative decoding as a virtual temporal transition from this anchor to the final reconstructed image.To model this progression, we leverage a pretrained video diffusion model (VDM) as temporal priors: the anchor frame serves as the initial frame and the original image as the target frame, transforming the decoding process into a next-frame prediction task.In contrast to image diffusion-based ULB-IC models, our decoding proceeds from a visible, semantically faithful anchor, which improves both fidelity and realism for perceptual image compression. Extensive experiments demonstrate that our method achieves superior objective and subjective performance. On the CLIC2020 test set, our method achieves over \textbf{50\% bitrate savings} across LPIPS, DISTS, FID, and KID compared to DiffC, while also delivering a significant decoding speedup of up to $\times$5. Code will be released later.

Problem

Research questions and friction points this paper is trying to address.

ultra-low-bitrate image compression

perceptual image compression

generative image compression

next-frame prediction

video diffusion priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

ultra-low-bitrate image compression

video diffusion model

anchor frame