🤖 AI Summary
This work addresses the performance degradation of existing learning-based methods for screen content (SC) image compression, caused by sharp edges, embedded text/graphics, and repetitive textures. We propose the first end-to-end frequency-decomposition learning framework for SC compression. Methodologically, we design a multi-frequency two-stage octave residual block and a cascaded three-scale feature fusion module, introduce a frequency-domain adaptive quantization mechanism, and construct SDU-SCICD10K—the first large-scale, SC-specific dataset comprising 10,000 images. Our key contributions are: (i) the first integration of multi-frequency modeling and adaptive quantization within a unified SC compression framework; and (ii) state-of-the-art performance—our model significantly outperforms HEVC, VVC, and leading learned codecs in both PSNR and MS-SSIM, especially at high compression ratios, while preserving text/graphic fidelity. This establishes a new paradigm for efficient SC image compression.
📝 Abstract
The learned image compression (LIC) methods have already surpassed traditional techniques in compressing natural scene (NS) images. However, directly applying these methods to screen content (SC) images, which possess distinct characteristics such as sharp edges, repetitive patterns, embedded text and graphics, yields suboptimal results. This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. To overcome these challenges, we propose a novel compression method that employs a multi-frequency two-stage octave residual block (MToRB) for feature extraction, a cascaded triple-scale feature fusion residual block (CTSFRB) for multi-scale feature integration and a multi-frequency context interaction module (MFCIM) to reduce inter-frequency correlations. Additionally, we introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. Furthermore, we construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms. Experimental results demonstrate that our approach significantly improves SC image compression performance, outperforming traditional standards and state-of-the-art learning-based methods in terms of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM).