🤖 AI Summary
This study addresses high-fidelity image compression at extremely low bitrates, aiming to jointly optimize reconstruction fidelity and perceptual quality. The authors encode images into compact latent embeddings and employ a diffusion model at the decoder to iteratively refine reconstructions toward the natural image manifold. The work systematically situates generative lossy compression within the rate–distortion–perception trade-off framework, revealing its intrinsic connections to inverse problem solving and channel simulation. It further proposes a novel paradigm that models the compression channel as a diffusion process. Experimental results demonstrate substantial improvements in perceptual quality at ultra-low bitrates, while clarifying key technical pathways and fundamental challenges in this emerging research direction.
📝 Abstract
Popularized by their strong image generation performance, diffusion and related methods for generative modeling have found widespread success in visual media applications. In particular, diffusion methods have enabled new approaches to data compression, where realistic reconstructions can be generated at extremely low bit-rates. This article provides a unifying review of recent diffusion-based methods for generative lossy compression, with a focus on image compression. These methods generally encode the source into an embedding and employ a diffusion model to iteratively refine it in the decoding procedure, such that the final reconstruction approximately follows the ground truth data distribution. The embedding can take various forms and is typically transmitted via an auxiliary entropy model, and recent methods also explore the use of diffusion models themselves for information transmission via channel simulation. We review representative approaches through the lens of rate-distortion-perception theory, highlighting the role of common randomness and connections to inverse problems, and identify open challenges.