🤖 AI Summary
To address the limitation of conventional rate-distortion (R-D) optimization in image compression—its overreliance on pixel-level fidelity at the expense of perceptual quality—this paper introduces, for the first time, large-scale pre-trained diffusion models into compression pre-processing, proposing a novel rate-perception (R-P) optimization paradigm. Methodologically: (1) Stable Diffusion 2.1 is distilled into a single-step image-to-image translation model; (2) its attention modules are efficiently fine-tuned via Consistent Score Identity Distillation (CiD) coupled with a differentiable codec surrogate; (3) a joint rate-perception loss is formulated. This generative pre-processor requires no modification to standard codecs and leverages diffusion priors to enhance texture fidelity and suppress artifacts. Evaluated on the Kodak dataset, the method achieves a 30.13% BD-rate reduction under the DISTS metric, demonstrating substantial gains in subjective visual quality.
📝 Abstract
Preprocessing is a well-established technique for optimizing compression, yet existing methods are predominantly Rate-Distortion (R-D) optimized and constrained by pixel-level fidelity. This work pioneers a shift towards Rate-Perception (R-P) optimization by, for the first time, adapting a large-scale pre-trained diffusion model for compression preprocessing. We propose a two-stage framework: first, we distill the multi-step Stable Diffusion 2.1 into a compact, one-step image-to-image model using Consistent Score Identity Distillation (CiD). Second, we perform a parameter-efficient fine-tuning of the distilled model's attention modules, guided by a Rate-Perception loss and a differentiable codec surrogate. Our method seamlessly integrates with standard codecs without any modification and leverages the model's powerful generative priors to enhance texture and mitigate artifacts. Experiments show substantial R-P gains, achieving up to a 30.13% BD-rate reduction in DISTS on the Kodak dataset and delivering superior subjective visual quality.