π€ AI Summary
To address the low coding efficiency of stereo images in machine vision tasks, this paper proposes SICM, a machine-oriented end-to-end stereo image coding framework. Unlike conventional human-vision-centric compression paradigms, SICM introduces the first joint feature compression architecture explicitly designed for downstream 3D tasksβsuch as depth estimation and stereo matching. It features a Stereo Multi-scale Feature Compression (SMFC) module that simultaneously eliminates spatial, inter-view, and cross-scale redundancies, yielding compact yet discriminative binocular representations. The framework integrates differentiable quantization, entropy modeling, and joint rate-distortion optimization. Experimental results demonstrate that SICM significantly outperforms both the MPEG-recommended ICM baseline and state-of-the-art stereo image coding (SIC) methods in terms of compression efficiency and 3D task performance.
π Abstract
2D image coding for machines (ICM) has achieved great success in coding efficiency, while less effort has been devoted to stereo image fields. To promote the efficiency of stereo image compression (SIC) and intelligent analysis, the stereo image coding for machines (SICM) is formulated and explored in this paper. More specifically, a machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM, where the stereo visual features are effectively extracted, compressed, and transmitted for 3D visual task. To efficiently compress stereo visual features in MVSFC-Net, a stereo multi-scale feature compression (SMFC) module is designed to gradually transform sparse stereo multi-scale features into compact joint visual representations by removing spatial, inter-view, and cross-scale redundancies simultaneously. Experimental results show that the proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance, when compared with the existing ICM anchors recommended by MPEG and the state-of-the-art SIC method.