Fast Training-free Perceptual Image Compression

📅 2025-06-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing no-training perceptual image encoders rely on diffusion inversion or sample communication, resulting in per-image decoding times on the order of minutes—severely limiting practicality. This paper proposes a training-free, encoder-agnostic perceptual enhancement method operating solely at the decoder side, compatible with arbitrary (including non-differentiable) codecs such as VTM, ELIC, and MS-ILLM. Leveraging a pre-trained unconditional generative model, our approach employs gradient approximation and iterative refinement to achieve millisecond-to-second decoding latency, while providing theoretically grounded perceptual-distortion trade-offs. Under decoding budgets of 0.1–10 seconds, our method matches the Fréchet Inception Distance (FID) of minute-scale alternatives and significantly outperforms conditional generative approaches including HiFiC and MS-ILLM. Moreover, it substantially improves the perceptual quality of mainstream standard codecs without modifying their encoding pipelines.

Technology Category

Application Category

📝 Abstract
Training-free perceptual image codec adopt pre-trained unconditional generative model during decoding to avoid training new conditional generative model. However, they heavily rely on diffusion inversion or sample communication, which take 1 min to intractable amount of time to decode a single image. In this paper, we propose a training-free algorithm that improves the perceptual quality of any existing codec with theoretical guarantee. We further propose different implementations for optimal perceptual quality when decoding time budget is $approx 0.1$s, $0.1-10$s and $ge 10$s. Our approach: 1). improves the decoding time of training-free codec from 1 min to $0.1-10$s with comparable perceptual quality. 2). can be applied to non-differentiable codec such as VTM. 3). can be used to improve previous perceptual codecs, such as MS-ILLM. 4). can easily achieve perception-distortion trade-off. Empirically, we show that our approach successfully improves the perceptual quality of ELIC, VTM and MS-ILLM with fast decoding. Our approach achieves comparable FID to previous training-free codec with significantly less decoding time. And our approach still outperforms previous conditional generative model based codecs such as HiFiC and MS-ILLM in terms of FID. The source code is provided in the supplementary material.
Problem

Research questions and friction points this paper is trying to address.

Improves perceptual image compression without training
Reduces decoding time from minutes to seconds
Enhances existing codecs like VTM and MS-ILLM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained generative model for decoding
Improves decoding time to 0.1-10 seconds
Applicable to non-differentiable codecs like VTM
🔎 Similar Papers
No similar papers found.
Z
Ziran Zhu
Institute for AI Industry Research, Tsinghua University, Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences
Tongda Xu
Tongda Xu
Phd candidate, Tsinghua University
image & video compressionperceptual quality老北京 & 网吧大神
M
Minye Huang
Harbin Institute of Technology
Dailan He
Dailan He
PhD candidate @The Chinese University of Hong Kong
Computer VisionDeep LearningImage & Video CompressionImage & Video GenerationHomepage
Xingtong Ge
Xingtong Ge
Hong Kong University of Science and Technology, SenseTime, Beijing Institute of Technology
Diffusion modelsImage/Video CompressionGaussian Splatting
Xinjie Zhang
Xinjie Zhang
Researcher, Microsoft Research Asia
Multimodal Understanding and GenerationNeural CompressionGaussian Splatting
L
Ling Li
Institute for AI Industry Research, Tsinghua University, Institute of Software, Chinese Academy of Sciences
Y
Yan Wang
Institute for AI Industry Research, Tsinghua University, Department of Computer Science and Technology, Tsinghua University