🤖 AI Summary
Existing neural image compression methods struggle to simultaneously preserve semantic fidelity and perceptual realism at ultra-low bitrates: explicit representations often lack textural detail, while implicit ones are prone to semantic drift. This work proposes the first training-free unified framework that synergistically optimizes semantics and perception by guiding a diffusion model with explicit high-order semantic information and implicitly conveying fine-grained textures via reverse channel coding. The approach innovatively integrates explicit semantic and implicit texture representations and introduces a plug-in encoder to flexibly balance the distortion-perception trade-off. Evaluated on Kodak, DIV2K, and CLIC2020, the method achieves state-of-the-art perceptual performance, surpassing DiffC by 29.92%, 19.33%, and 20.89% in DISTS BD-Rate, respectively.
📝 Abstract
While recent neural codecs achieve strong performance at low bitrates when optimized for perceptual quality, their effectiveness deteriorates significantly under ultra-low bitrate conditions. To mitigate this, generative compression methods leveraging semantic priors from pretrained models have emerged as a promising paradigm. However, existing approaches are fundamentally constrained by a tradeoff between semantic faithfulness and perceptual realism. Methods based on explicit representations preserve content structure but often lack fine-grained textures, whereas implicit methods can synthesize visually plausible details at the cost of semantic drift. In this work, we propose a unified framework that bridges this gap by coherently integrating explicit and implicit representations in a training-free manner. Specifically, We condition a diffusion model on explicit high-level semantics while employing reverse-channel coding to implicitly convey fine-grained details. Moreover, we introduce a plug-in encoder that enables flexible control of the distortion-perception tradeoff by modulating the implicit information. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art rate-perception performance, outperforming existing methods and surpassing DiffC by 29.92%, 19.33%, and 20.89% in DISTS BD-Rate on the Kodak, DIV2K, and CLIC2020 datasets, respectively.