Hierarchical Semantic Compression for Consistent Image Semantic Restoration

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly preserving fidelity and semantic consistency in semantic compression, this paper proposes the first hierarchical semantic compression framework grounded in the intrinsic latent space of generative models. Methodologically: (1) a universal inversion encoder is constructed to establish a reversible mapping from images to a clean, semantics-pure latent space; (2) a hierarchical feature compression network (FCN) and semantic compression network (SCN) are co-designed to jointly optimize structural and semantic representations under low bitrates; (3) a channel-wise contextual entropy model and semantic consistency regularization are introduced to enhance coding efficiency and semantic coherence. Experiments demonstrate state-of-the-art performance in both subjective quality and semantic consistency, significantly improving downstream vision tasks—including object detection and semantic segmentation—while generating bitstreams more aligned with human visual perception mechanisms.

Technology Category

Application Category

📝 Abstract
The emerging semantic compression has been receiving increasing research efforts most recently, capable of achieving high fidelity restoration during compression, even at extremely low bitrates. However, existing semantic compression methods typically combine standard pipelines with either pre-defined or high-dimensional semantics, thus suffering from deficiency in compression. To address this issue, we propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models, which is able to achieve efficient compression for consistent semantic restoration. More specifically, we first analyse the entropy models for the semantic compression, which motivates us to employ a hierarchical architecture based on a newly developed general inversion encoder. Then, we propose the feature compression network (FCN) and semantic compression network (SCN), such that the middle-level semantic feature and core semantics are hierarchically compressed to restore both accuracy and consistency of image semantics, via an entropy model progressively shared by channel-wise context. Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision, together with superior performances on machine vision tasks given compressed bitstreams. This essentially coincides with human visual system in understanding images, thus providing a new framework for future image/video compression paradigms. Our code shall be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Hierarchical semantic compression for image restoration
Efficient compression in intrinsic semantic spaces
Consistent semantic restoration via hierarchical architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical semantic compression framework
General inversion encoder architecture
Feature and semantic compression networks
🔎 Similar Papers
No similar papers found.
S
Shengxi Li
School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
Z
Zifu Zhang
School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
L
Lai Jiang
School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
Mai Xu
Mai Xu
Beihang Univeristy, Tsinghua Univeristy, Imperial College London
Yufan Liu
Yufan Liu
Institute of Automation, Chinese Academy of Sciences
Image/video processingKnowledge DistillationSaliency detectionModel compressionVideo coding
Ce Zhu
Ce Zhu
FIEEE, University of Electronic Science and Technology of China
Visual Information ProcessingVisual Coding & CommunicationsMachine Learning with Applications