π€ AI Summary
Existing stereo image compression methods struggle to effectively model the joint spatial and disparity dependencies, leading to suboptimal rate-distortion performance. To address this, we propose an end-to-end learnable decoder-free Transformer-based entropy model. Our approach introduces content-aware masked image modeling (MIM), enabling efficient bidirectional interaction between priors and target tokens to explicitly capture cross-view and spatial contextual dependencies. By eliminating the redundant entropy decoder, we integrate independent dual-path latent-space transformations with joint spatial-disparity entropy modeling. Experimental results demonstrate state-of-the-art rate-distortion performance on Cityscapes and InStereo2K. Moreover, our method achieves significantly faster encoding and decoding speeds compared to existing learned stereo codecs, without sacrificing compression efficiency.
π Abstract
Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed. Code is available at https://github.com/Xinjie-Q/CAMSIC.