🤖 AI Summary
Existing diffusion-based codecs rely on text-conditioned pre-trained foundation models (e.g., Stable Diffusion), making them ill-suited for image compression—especially at ultra-low bitrates. This work introduces CoD, the first diffusion foundation model trained from scratch specifically for compression, eliminating text conditioning and enabling end-to-end optimization on pure image data. CoD establishes a pixel-level diffusion codec framework supporting dual-path compression in both latent and pixel spaces, achieving superior trade-offs between PSNR fidelity and perceptual quality. At 0.0039 bpp, it sets a new state-of-the-art. Notably, CoD’s parameter count is substantially smaller than GAN-based codecs, and its training efficiency improves by 300× over prior diffusion approaches (20 vs. 6,250 A100 GPU-days). The code will be publicly released.
📝 Abstract
Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce extbf{CoD}, the first extbf{Co}mpression-oriented extbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: extbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); extbf{Low-cost and reproducible training}, 300$ imes$ faster training than Stable Diffusion ($sim$ 20 vs. $sim$ 6,250 A100 GPU days) on entirely open image-only datasets; extbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.