PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses perceptual image compression at ultra-low bitrates (<0.05 bpp) under bandwidth- and storage-constrained settings. We propose the first open-source, end-to-end system built upon the Stable Diffusion 3 architecture. Our key contributions are: (1) introducing implicit hierarchical masked image modeling for ultra-low-bitrate compression—explicitly modeling discrete super-latent variable distributions to improve entropy coding efficiency; (2) designing a hybrid generative inference mode that further reduces bitrate without compromising reconstruction fidelity; and (3) conducting a systematic comparative analysis of VAR and MaskGIT for entropy modeling. Evaluated on MSCOCO-30k, our method achieves state-of-the-art performance: higher PSNR and lower LPIPS than prior approaches, while operating at significantly lower bitrates and delivering superior user-perceived quality. All components—including model weights, training code, and inference pipelines—are fully open-sourced.

Technology Category

Application Category

📝 Abstract
We introduce PerCoV2, a novel and open ultra-low bit-rate perceptual image compression system designed for bandwidth- and storage-constrained applications. Building upon prior work by Careil et al., PerCoV2 extends the original formulation to the Stable Diffusion 3 ecosystem and enhances entropy coding efficiency by explicitly modeling the discrete hyper-latent image distribution. To this end, we conduct a comprehensive comparison of recent autoregressive methods (VAR and MaskGIT) for entropy modeling and evaluate our approach on the large-scale MSCOCO-30k benchmark. Compared to previous work, PerCoV2 (i) achieves higher image fidelity at even lower bit-rates while maintaining competitive perceptual quality, (ii) features a hybrid generation mode for further bit-rate savings, and (iii) is built solely on public components. Code and trained models will be released at https://github.com/Nikolai10/PerCoV2.
Problem

Research questions and friction points this paper is trying to address.

Develops ultra-low bit-rate image compression for constrained applications.
Enhances entropy coding efficiency using discrete hyper-latent modeling.
Improves image fidelity and perceptual quality at lower bit-rates.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Stable Diffusion 3 for compression
Enhances entropy coding efficiency
Uses public components for accessibility
🔎 Similar Papers
No similar papers found.
N
Nikolai Korber
Technical University of Munich
E
Eduard Kromer
University of Applied Sciences Landshut
A
Andreas Siebert
University of Applied Sciences Landshut
S
Sascha Hauke
University of Applied Sciences Landshut
Daniel Mueller-Gritschneder
Daniel Mueller-Gritschneder
TU Wien
Embedded System DesignSystem Modeling and SimulationFault-toleranceTinyMLHW Security
B
Bjorn Schuller
Technical University of Munich