π€ AI Summary
This work addresses the challenges of optimizing quantized latent variables and poor generalization in neural image compression. We propose an information-theoretic structured regularization method grounded in entropy analysis. Our key insight is the theoretical equivalence between minimizing latent entropy and maximizing conditional source entropyβa novel finding that enables the design of an interpretable, plug-and-play, zero-overhead regularizer. Specifically, we explicitly incorporate a negative conditional source entropy term into the rate-distortion objective to jointly optimize compression efficiency and model robustness. The method is fully compatible with end-to-end differentiable architectures (e.g., CNNs and Transformers) and theoretically justified via entropy inequalities and conditional entropy modeling. Experiments demonstrate consistent improvements across diverse compression models and unseen resolutions/datasets: average bitrate reduction of 4.2 bpp, significant gains in PSNR and MS-SSIM, and substantially enhanced generalization capability.
π Abstract
Lossy image compression networks aim to minimize the latent entropy of images while adhering to specific distortion constraints. However, optimizing the neural network can be challenging due to its nature of learning quantized latent representations. In this paper, our key finding is that minimizing the latent entropy is, to some extent, equivalent to maximizing the conditional source entropy, an insight that is deeply rooted in information-theoretic equalities. Building on this insight, we propose a novel structural regularization method for the neural image compression task by incorporating the negative conditional source entropy into the training objective, such that both the optimization efficacy and the model's generalization ability can be promoted. The proposed information-theoretic regularizer is interpretable, plug-and-play, and imposes no inference overheads. Extensive experiments demonstrate its superiority in regularizing the models and further squeezing bits from the latent representation across various compression structures and unseen domains.