🤖 AI Summary
Existing diffusion-based image compression methods suffer from slow inference and low reconstruction fidelity at ultra-low bitrates, as they reconstruct images iteratively from pure noise. This work proposes a relay residual diffusion paradigm: starting from noisy compressed features rather than pure noise, we derive a novel residual diffusion stochastic differential equation (SDE) that eliminates redundant denoising steps; additionally, we introduce a fixed-step fine-tuning strategy to bridge the discrepancy between training and inference step counts. Built upon the Stable Diffusion framework, our approach integrates compressed-feature initialization, residual modeling, and end-to-end fine-tuning. Under extreme compression ratios (e.g., 0.01–0.05 bpp), the method achieves state-of-the-art visual fidelity—surpassing prior diffusion-based compressors in both PSNR/MS-SSIM and perceptual quality—while accelerating inference by up to 4×. This represents a dual breakthrough in reconstruction accuracy and computational efficiency for diffusion-based image compression.
📝 Abstract
Diffusion-based extreme image compression methods have achieved impressive performance at extremely low bitrates. However, constrained by the iterative denoising process that starts from pure noise, these methods are limited in both fidelity and efficiency. To address these two issues, we present Relay Residual Diffusion Extreme Image Compression (RDEIC), which leverages compressed feature initialization and residual diffusion. Specifically, we first use the compressed latent features of the image with added noise, instead of pure noise, as the starting point to eliminate the unnecessary initial stages of the denoising process. Second, we directly derive a novel residual diffusion equation from Stable Diffusion's original diffusion equation that reconstructs the raw image by iteratively removing the added noise and the residual between the compressed and target latent features. In this way, we effectively combine the efficiency of residual diffusion with the powerful generative capability of Stable Diffusion. Third, we propose a fixed-step fine-tuning strategy to eliminate the discrepancy between the training and inference phases, thereby further improving the reconstruction quality. Extensive experiments demonstrate that the proposed RDEIC achieves state-of-the-art visual quality and outperforms existing diffusion-based extreme image compression methods in both fidelity and efficiency. The source code will be provided in https://github.com/huai-chang/RDEIC.