🤖 AI Summary
Existing vector quantization–based image compression methods suffer from a disconnect between representation learning and entropy modeling due to the absence of end-to-end rate-distortion (RD) joint optimization, making it challenging to simultaneously preserve structural fidelity and achieve high compression efficiency at extremely low bitrates. This work proposes the RDVQ framework, which for the first time explicitly incorporates entropy constraints into vector quantization by employing a differentiable relaxation of codebook distributions, thereby enabling true RD joint optimization. The resulting entropy loss directly guides the learning of latent priors, while an autoregressive entropy model facilitates precise entropy estimation and bitrate control during inference. Unifying image tokenization and compression within a single framework, RDVQ significantly outperforms RDEIC on DIV2K-val with fewer parameters—achieving a 75.71% reduction in DISTS bitrate and a 37.63% improvement in LPIPS—while delivering competitive perceptual quality.
📝 Abstract
The rapid growth of visual data under stringent storage and bandwidth constraints makes extremely low-bitrate image compression increasingly important. While Vector Quantization (VQ) offers strong structural fidelity, existing methods lack a principled mechanism for joint rate-distortion (RD) optimization due to the disconnect between representation learning and entropy modeling. We propose RDVQ, a unified framework that enables end-to-end RD optimization for VQ-based compression via a differentiable relaxation of the codebook distribution, allowing the entropy loss to directly shape the latent prior. We further develop an autoregressive entropy model that supports accurate entropy modeling and test-time rate control. Extensive experiments demonstrate that RDVQ achieves strong performance at extremely low bitrates with a lightweight architecture, attaining competitive or superior perceptual quality with significantly fewer parameters. Compared with RDEIC, RDVQ reduces bitrate by up to 75.71% on DISTS and 37.63% on LPIPS on DIV2K-val. Beyond empirical gains, RDVQ introduces an entropy-constrained formulation of VQ, highlighting the potential for a more unified view of image tokenization and compression. The code will be available at https://github.com/CVL-UESTC/RDVQ.