Stereo Image Coding for Machines with Joint Visual Feature Compression

πŸ“… 2025-02-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the low coding efficiency of stereo images in machine vision tasks, this paper proposes SICM, a machine-oriented end-to-end stereo image coding framework. Unlike conventional human-vision-centric compression paradigms, SICM introduces the first joint feature compression architecture explicitly designed for downstream 3D tasksβ€”such as depth estimation and stereo matching. It features a Stereo Multi-scale Feature Compression (SMFC) module that simultaneously eliminates spatial, inter-view, and cross-scale redundancies, yielding compact yet discriminative binocular representations. The framework integrates differentiable quantization, entropy modeling, and joint rate-distortion optimization. Experimental results demonstrate that SICM significantly outperforms both the MPEG-recommended ICM baseline and state-of-the-art stereo image coding (SIC) methods in terms of compression efficiency and 3D task performance.

Technology Category

Application Category

πŸ“ Abstract
2D image coding for machines (ICM) has achieved great success in coding efficiency, while less effort has been devoted to stereo image fields. To promote the efficiency of stereo image compression (SIC) and intelligent analysis, the stereo image coding for machines (SICM) is formulated and explored in this paper. More specifically, a machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM, where the stereo visual features are effectively extracted, compressed, and transmitted for 3D visual task. To efficiently compress stereo visual features in MVSFC-Net, a stereo multi-scale feature compression (SMFC) module is designed to gradually transform sparse stereo multi-scale features into compact joint visual representations by removing spatial, inter-view, and cross-scale redundancies simultaneously. Experimental results show that the proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance, when compared with the existing ICM anchors recommended by MPEG and the state-of-the-art SIC method.
Problem

Research questions and friction points this paper is trying to address.

Enhances stereo image compression for machine analysis
Proposes MVSFC-Net for efficient stereo feature extraction
Achieves superior 3D visual task performance and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stereo Image Coding for Machines
Machine Vision-Oriented Network
Stereo Multi-Scale Feature Compression
πŸ”Ž Similar Papers
No similar papers found.
D
Dengchao Jin
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Jianjun Lei
Jianjun Lei
Tianjin University
MultimediaVideo Coding3D/VR/ARArtificial IntelligencePattern Recognition
B
Bo Peng
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Z
Zhaoqing Pan
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
N
Nam Ling
Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA 95053, USA
Qingming Huang
Qingming Huang
University of the Chinese Academy of Sciences
Multimedia Analysis and RetrievalImage and Video ProcessingPattern RecognitionComputer VisionVideo Coding