🤖 AI Summary
Existing learned image compression methods require separate training and storage of models for each quality level, resulting in high training overhead and inflexible deployment. To address this, we propose a universal perceptual compression framework leveraging the latent space of pretrained VAEs (e.g., Stable Diffusion): a frozen pretrained encoder extracts image latent variables, while only a lightweight, over-parameterized learnable compression module is introduced in the latent space—enabling single-model, quality-controllable compression across arbitrary bitrates. Crucially, the backbone network remains fixed without fine-tuning, ensuring model-agnostic and resolution-agnostic deployment with merely 25.5 MACs per pixel. Experiments demonstrate that our method achieves perceptual quality on par with state-of-the-art approaches, while significantly reducing training cost and decoder resource consumption.
📝 Abstract
Current image compression models often require separate models for each quality level, making them resource-intensive in terms of both training and storage. To address these limitations, we propose an innovative approach that utilizes latent variables from pre-existing trained models (such as the Stable Diffusion Variational Autoencoder) for perceptual image compression. Our method eliminates the need for distinct models dedicated to different quality levels. We employ overfitted learnable functions to compress the latent representation from the target model at any desired quality level. These overfitted functions operate in the latent space, ensuring low computational complexity, around $25.5$ MAC/pixel for a forward pass on images with dimensions $(1363 imes 2048)$ pixels. This approach efficiently utilizes resources during both training and decoding. Our method achieves comparable perceptual quality to state-of-the-art learned image compression models while being both model-agnostic and resolution-agnostic. This opens up new possibilities for the development of innovative image compression methods.