🤖 AI Summary
Existing image compression methods are designed for human vision and operate on rectangular blocks, resulting in transmission of semantically redundant content for downstream intelligent tasks; while semantic-structured approaches partition images into rectangular regions based on semantics, inter-region information coupling induces boundary artifacts and bitrate inefficiency. This paper proposes an irregular semantic grouping-based structured compression framework: custom group masks decompose the image into non-rectangular, semantically coherent groups, and a Group-Independent Swin-Block enables fully decoupled, group-wise independent transformation and coding. The method eliminates boundary distortions, enhances selective reconstruction fidelity and task adaptability, and significantly reduces structural overhead and bitrate—achieving an average 12.7% bitrate saving—while preserving subjective quality; boundary PSNR improves by 3.2 dB.
📝 Abstract
Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. Our group mask describes the image at a finer granularity, enabling significant bitrate saving by reducing the transmission of redundant content. Moreover, to ensure the fidelity of selective reconstruction, this paper proposes the concept of group-independent transform that maintain the independence among distinct groups. And we instantiate it by the proposed Group-Independent Swin-Block (GI Swin-Block). Experimental results demonstrate that our framework structures the bitstream with negligible cost, and exhibits superior performance on both visual quality and intelligent task supporting.