DLF: Extreme Image Compression with Dual-generative Latent Fusion

๐Ÿ“… 2025-03-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing extreme image compression methods overly rely on semantic clustering at ultra-low bitrates (<0.01 bpp), leading to severe loss of fine-grained details and degraded reconstruction fidelity. To address this, we propose a dual generative latent-space fusion framework, introducing the first semanticโ€“detail disentangled two-branch compression paradigm: a semantic branch performs clustering-based tokenization, while a detail branch enables perception-sensitive encoding; cross-branch interaction distillation suppresses redundancy and enforces consistency. The framework constructs dual-path latent spaces via a generative tokenizer, jointly optimizing for high-fidelity reconstruction and generative authenticity. On the CLIC2020 dataset, our method reduces LPIPS by 27.93% and DISTS by 53.55% relative to MS-ILLM, and achieves markedly superior visual quality compared to state-of-the-art diffusion-based codecs.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent studies in extreme image compression have achieved remarkable performance by compressing the tokens from generative tokenizers. However, these methods often prioritize clustering common semantics within the dataset, while overlooking the diverse details of individual objects. Consequently, this results in suboptimal reconstruction fidelity, especially at low bitrates. To address this issue, we introduce a Dual-generative Latent Fusion (DLF) paradigm. DLF decomposes the latent into semantic and detail elements, compressing them through two distinct branches. The semantic branch clusters high-level information into compact tokens, while the detail branch encodes perceptually critical details to enhance the overall fidelity. Additionally, we propose a cross-branch interactive design to reduce redundancy between the two branches, thereby minimizing the overall bit cost. Experimental results demonstrate the impressive reconstruction quality of DLF even below 0.01 bits per pixel (bpp). On the CLIC2020 test set, our method achieves bitrate savings of up to 27.93% on LPIPS and 53.55% on DISTS compared to MS-ILLM. Furthermore, DLF surpasses recent diffusion-based codecs in visual fidelity while maintaining a comparable level of generative realism. Code will be available later.
Problem

Research questions and friction points this paper is trying to address.

Improves image compression by separating semantic and detail elements.
Reduces redundancy between semantic and detail encoding branches.
Enhances reconstruction fidelity at extremely low bitrates.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-generative Latent Fusion for image compression
Separates semantic and detail elements for compression
Cross-branch interaction reduces redundancy, lowers bit cost
๐Ÿ”Ž Similar Papers
No similar papers found.
Naifu Xue
Naifu Xue
Communication University of China
Zhaoyang Jia
Zhaoyang Jia
University of Science and Technology of China
Video compressiondigital watermarking
J
Jiahao Li
Microsoft Research Asia
B
Bin Li
Microsoft Research Asia
Y
Yuan Zhang
Communication University of China
Y
Yan Lu
Microsoft Research Asia