Fast Training-free Perceptual Image Compression

📅 2025-06-19

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

Existing no-training perceptual image encoders rely on diffusion inversion or sample communication, resulting in per-image decoding times on the order of minutes—severely limiting practicality. This paper proposes a training-free, encoder-agnostic perceptual enhancement method operating solely at the decoder side, compatible with arbitrary (including non-differentiable) codecs such as VTM, ELIC, and MS-ILLM. Leveraging a pre-trained unconditional generative model, our approach employs gradient approximation and iterative refinement to achieve millisecond-to-second decoding latency, while providing theoretically grounded perceptual-distortion trade-offs. Under decoding budgets of 0.1–10 seconds, our method matches the Fréchet Inception Distance (FID) of minute-scale alternatives and significantly outperforms conditional generative approaches including HiFiC and MS-ILLM. Moreover, it substantially improves the perceptual quality of mainstream standard codecs without modifying their encoding pipelines.

Technology Category

Application Category

📝 Abstract

Training-free perceptual image codec adopt pre-trained unconditional generative model during decoding to avoid training new conditional generative model. However, they heavily rely on diffusion inversion or sample communication, which take 1 min to intractable amount of time to decode a single image. In this paper, we propose a training-free algorithm that improves the perceptual quality of any existing codec with theoretical guarantee. We further propose different implementations for optimal perceptual quality when decoding time budget is $approx 0.1$s, $0.1-10$s and $ge 10$s. Our approach: 1). improves the decoding time of training-free codec from 1 min to $0.1-10$s with comparable perceptual quality. 2). can be applied to non-differentiable codec such as VTM. 3). can be used to improve previous perceptual codecs, such as MS-ILLM. 4). can easily achieve perception-distortion trade-off. Empirically, we show that our approach successfully improves the perceptual quality of ELIC, VTM and MS-ILLM with fast decoding. Our approach achieves comparable FID to previous training-free codec with significantly less decoding time. And our approach still outperforms previous conditional generative model based codecs such as HiFiC and MS-ILLM in terms of FID. The source code is provided in the supplementary material.

Problem

Research questions and friction points this paper is trying to address.

Improves perceptual image compression without training

Reduces decoding time from minutes to seconds

Enhances existing codecs like VTM and MS-ILLM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained generative model for decoding

Improves decoding time to 0.1-10 seconds

Applicable to non-differentiable codecs like VTM

🔎 Similar Papers

No similar papers found.