🤖 AI Summary
Multi-view video suffers from large data volumes and high storage/transmission overheads; existing end-to-end deep video coding methods primarily target single- or dual-view scenarios, lacking efficient modeling for general multi-view configurations. This paper proposes LMVC—the first end-to-end learnable multi-view video coding framework—supporting random access and HEVC backward compatibility. Its core innovations are: (1) feature-based cross-view motion vector prediction, eliminating explicit disparity estimation; (2) a disparity-free cross-view contextual prediction module; and (3) a cross-view entropy model jointly capturing inter-view motion and content correlations. Experiments on standard benchmarks demonstrate that LMVC significantly outperforms MV-HEVC, achieving an average bitrate reduction of 28.6%, thereby establishing a new state-of-the-art benchmark for learned multi-view video compression.
📝 Abstract
Multiview video is a key data source for volumetric video, enabling immersive 3D scene reconstruction but posing significant challenges in storage and transmission due to its massive data volume. Recently, deep learning-based end-to-end video coding has achieved great success, yet most focus on single-view or stereo videos, leaving general multiview scenarios underexplored. This paper proposes an end-to-end learned multiview video coding (LMVC) framework that ensures random access and backward compatibility while enhancing compression efficiency. Our key innovation lies in effectively leveraging independent-view motion and content information to enhance dependent-view compression. Specifically, to exploit the inter-view motion correlation, we propose a feature-based inter-view motion vector prediction method that conditions dependent-view motion encoding on decoded independent-view motion features, along with an inter-view motion entropy model that learns inter-view motion priors. To exploit the inter-view content correlation, we propose a disparity-free inter-view context prediction module that predicts inter-view contexts from decoded independent-view content features, combined with an inter-view contextual entropy model that captures inter-view context priors. Experimental results show that our proposed LMVC framework outperforms the reference software of the traditional MV-HEVC standard by a large margin, establishing a strong baseline for future research in this field.