🤖 AI Summary
Generative image compression suffers from inflexible bit-rate control: existing methods struggle to simultaneously achieve high reconstruction fidelity and strong generalization across a wide, fine-grained range of bit rates. This paper proposes the first framework enabling continuous, controllable bit-rate adaptation. Our method builds upon the VQGAN architecture and integrates three key components: (1) an information-density-driven dynamic granularity adaptation mechanism that explicitly links local image complexity to vector quantization (VQ) codebook length; (2) a probabilistic conditional hierarchical decoder for multi-granularity feature reconstruction and conditional feature aggregation; and (3) variable-length VQ with density-aware granularity allocation and layered probabilistic modeling. Extensive experiments demonstrate substantial improvements over state-of-the-art methods across multiple benchmarks, achieving superior trade-offs between rate-distortion performance and perceptual quality.
📝 Abstract
Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, termed Control-GIC, the first capable of fine-grained bitrate adaption across a broad spectrum while ensuring high-fidelity and generality compression. Control-GIC is grounded in a VQGAN framework that encodes an image as a sequence of variable-length codes (i.e. VQ-indices), which can be losslessly compressed and exhibits a direct positive correlation with the bitrates. Drawing inspiration from the classical coding principle, we correlate the information density of local image patches with their granular representations. Hence, we can flexibly determine a proper allocation of granularity for the patches to achieve dynamic adjustment for VQ-indices, resulting in desirable compression rates. We further develop a probabilistic conditional decoder capable of retrieving historic encoded multi-granularity representations according to transmitted codes, and then reconstruct hierarchical granular features in the formalization of conditional probability, enabling more informative aggregation to improve reconstruction realism. Our experiments show that Control-GIC allows highly flexible and controllable bitrate adaption where the results demonstrate its superior performance over recent state-of-the-art methods.