LMVC: An End-to-End Learned Multiview Video Coding Framework

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-view video suffers from large data volumes and high storage/transmission overheads; existing end-to-end deep video coding methods primarily target single- or dual-view scenarios, lacking efficient modeling for general multi-view configurations. This paper proposes LMVC—the first end-to-end learnable multi-view video coding framework—supporting random access and HEVC backward compatibility. Its core innovations are: (1) feature-based cross-view motion vector prediction, eliminating explicit disparity estimation; (2) a disparity-free cross-view contextual prediction module; and (3) a cross-view entropy model jointly capturing inter-view motion and content correlations. Experiments on standard benchmarks demonstrate that LMVC significantly outperforms MV-HEVC, achieving an average bitrate reduction of 28.6%, thereby establishing a new state-of-the-art benchmark for learned multi-view video compression.

Technology Category

Application Category

📝 Abstract
Multiview video is a key data source for volumetric video, enabling immersive 3D scene reconstruction but posing significant challenges in storage and transmission due to its massive data volume. Recently, deep learning-based end-to-end video coding has achieved great success, yet most focus on single-view or stereo videos, leaving general multiview scenarios underexplored. This paper proposes an end-to-end learned multiview video coding (LMVC) framework that ensures random access and backward compatibility while enhancing compression efficiency. Our key innovation lies in effectively leveraging independent-view motion and content information to enhance dependent-view compression. Specifically, to exploit the inter-view motion correlation, we propose a feature-based inter-view motion vector prediction method that conditions dependent-view motion encoding on decoded independent-view motion features, along with an inter-view motion entropy model that learns inter-view motion priors. To exploit the inter-view content correlation, we propose a disparity-free inter-view context prediction module that predicts inter-view contexts from decoded independent-view content features, combined with an inter-view contextual entropy model that captures inter-view context priors. Experimental results show that our proposed LMVC framework outperforms the reference software of the traditional MV-HEVC standard by a large margin, establishing a strong baseline for future research in this field.
Problem

Research questions and friction points this paper is trying to address.

Proposes learned multiview video coding for efficient compression
Leverages inter-view motion and content correlations for compression
Ensures random access and backward compatibility in multiview coding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-based inter-view motion vector prediction
Disparity-free inter-view context prediction module
Inter-view motion and contextual entropy models
🔎 Similar Papers
No similar papers found.
Xihua Sheng
Xihua Sheng
University of Science and Technology of China->City University of Hong Kong
Video codingImage codingPoint Cloud coding
Y
Yingwen Zhang
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Long Xu
Long Xu
Ningbo University, Peng Cheng Laboratory
image/signal processingvideo codingespecially rate control of video codingimage/signal
S
Shiqi Wang
Department of Computer Science, City University of Hong Kong, Hong Kong, China