🤖 AI Summary
This work proposes a semantic hierarchy-aware progressive image compression framework that explicitly incorporates semantic structure into the compression process. While existing methods primarily focus on sample-level difficulty adaptation, they often fail to preserve high-level semantic information effectively at low bitrates. To address this limitation, the proposed approach leverages CLIP embeddings to construct a semantic hierarchy over ImageNet-1K classes and explicitly decomposes and optimizes corresponding channel blocks in the latent representation. This enables semantically scalable transmission—from coarse- to fine-grained semantics—within a single bitstream. Experimental results demonstrate that the method significantly improves coarse-grained recognition performance at low bitrates while maintaining fine-grained classification accuracy at higher bitrates, outperforming current state-of-the-art progressive codecs across the bitrate spectrum.
📝 Abstract
Recent advances in learned image compression (LIC) have enabled practical deployments, spurring active research into image compression for machines and progressive coding schemes. However, their integration remains under-explored: prior works on progressive machine codec predominantly target sample-level difficulty adaptation (i.e., easy-to-hard), without considering semantic-level scalability. In this work, we introduce a semantic hierarchy-aware progressive codec that enables semantic scalability (i.e., coarse-to-fine) from a single bitstream. We first systematically categorize ImageNet-1K classes into CLIP embedding-based semantic hierarchies. Based on a channel-wise autoregressive framework, we decompose latent representations into hierarchically ordered channel blocks, each explicitly optimized for a corresponding semantic hierarchy. Extensive experiments demonstrate that our approach substantially improves coarse-level recognition at low bitrates while maintaining fine-grained accuracy at higher bitrates. By reframing progressive transmission through the lens of semantic scalability, our work provides an efficient and interpretable solution for task-adaptive image coding, outperforming existing progressive codecs under hierarchical evaluation.