CoD: A Diffusion Foundation Model for Image Compression

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based codecs rely on text-conditioned pre-trained foundation models (e.g., Stable Diffusion), making them ill-suited for image compression—especially at ultra-low bitrates. This work introduces CoD, the first diffusion foundation model trained from scratch specifically for compression, eliminating text conditioning and enabling end-to-end optimization on pure image data. CoD establishes a pixel-level diffusion codec framework supporting dual-path compression in both latent and pixel spaces, achieving superior trade-offs between PSNR fidelity and perceptual quality. At 0.0039 bpp, it sets a new state-of-the-art. Notably, CoD’s parameter count is substantially smaller than GAN-based codecs, and its training efficiency improves by 300× over prior diffusion approaches (20 vs. 6,250 A100 GPU-days). The code will be publicly released.

Technology Category

Application Category

📝 Abstract
Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce extbf{CoD}, the first extbf{Co}mpression-oriented extbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: extbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); extbf{Low-cost and reproducible training}, 300$ imes$ faster training than Stable Diffusion ($sim$ 20 vs. $sim$ 6,250 A100 GPU days) on entirely open image-only datasets; extbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.
Problem

Research questions and friction points this paper is trying to address.

Developing a diffusion foundation model optimized for image compression
Overcoming limitations of text-conditioned models for ultra-low bitrate compression
Enabling efficient training and high-quality compression with pixel-space diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed compression-oriented diffusion foundation model
Achieved state-of-the-art ultra-low bitrate compression
Enabled 300x faster training than Stable Diffusion
🔎 Similar Papers