🤖 AI Summary
To address insufficient modeling of frequency-domain correlations and suboptimal rate-distortion (RD) performance in learning-based image compression, this paper proposes the first end-to-end framework integrating 3D multi-level wavelet transforms. Its core contributions are: (1) a 3D wavelet-domain convolutional layer (3DM-WeConv) that jointly suppresses spatial, channel-wise, and frequency-domain redundancies; and (2) a wavelet-domain channel-wise autoregressive entropy model (3DWeChARM), enabling low-frequency prior-guided high-frequency coding and two-stage rate imbalance optimization. The framework incorporates both 5/3 and 9/7 wavelet transforms, subband-adaptive convolutions, inverse 3D discrete wavelet transform (DWT), and slice-based entropy coding, trained via a two-stage weighted strategy. Experimental results on Kodak, Tecnick100, and CLIC datasets show BD-rate reductions of 12.24%, 15.51%, and 12.97%, respectively, over H.266/VVC—demonstrating substantial improvements in RD performance and computational efficiency for high-resolution image compression.
📝 Abstract
Learned image compression (LIC) has recently made significant progress, surpassing traditional methods. However, most LIC approaches operate mainly in the spatial domain and lack mechanisms for reducing frequency-domain correlations. To address this, we propose a novel framework that integrates low-complexity 3D multi-level Discrete Wavelet Transform (DWT) into convolutional layers and entropy coding, reducing both spatial and channel correlations to improve frequency selectivity and rate-distortion (R-D) performance. Our proposed 3D multi-level wavelet-domain convolution (3DM-WeConv) layer first applies 3D multi-level DWT (e.g., 5/3 and 9/7 wavelets from JPEG 2000) to transform data into the wavelet domain. Then, different-sized convolutions are applied to different frequency subbands, followed by inverse 3D DWT to restore the spatial domain. The 3DM-WeConv layer can be flexibly used within existing CNN-based LIC models. We also introduce a 3D wavelet-domain channel-wise autoregressive entropy model (3DWeChARM), which performs slice-based entropy coding in the 3D DWT domain. Low-frequency (LF) slices are encoded first to provide priors for high-frequency (HF) slices. A two-step training strategy is adopted: first balancing LF and HF rates, then fine-tuning with separate weights. Extensive experiments demonstrate that our framework consistently outperforms state-of-the-art CNN-based LIC methods in R-D performance and computational complexity, with larger gains for high-resolution images. On the Kodak, Tecnick 100, and CLIC test sets, our method achieves BD-Rate reductions of -12.24%, -15.51%, and -12.97%, respectively, compared to H.266/VVC.