LMVC: An End-to-End Learned Multiview Video Coding Framework

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Multi-view video suffers from large data volumes and high storage/transmission overheads; existing end-to-end deep video coding methods primarily target single- or dual-view scenarios, lacking efficient modeling for general multi-view configurations. This paper proposes LMVC—the first end-to-end learnable multi-view video coding framework—supporting random access and HEVC backward compatibility. Its core innovations are: (1) feature-based cross-view motion vector prediction, eliminating explicit disparity estimation; (2) a disparity-free cross-view contextual prediction module; and (3) a cross-view entropy model jointly capturing inter-view motion and content correlations. Experiments on standard benchmarks demonstrate that LMVC significantly outperforms MV-HEVC, achieving an average bitrate reduction of 28.6%, thereby establishing a new state-of-the-art benchmark for learned multi-view video compression.

Technology Category

Application Category

📝 Abstract

Multiview video is a key data source for volumetric video, enabling immersive 3D scene reconstruction but posing significant challenges in storage and transmission due to its massive data volume. Recently, deep learning-based end-to-end video coding has achieved great success, yet most focus on single-view or stereo videos, leaving general multiview scenarios underexplored. This paper proposes an end-to-end learned multiview video coding (LMVC) framework that ensures random access and backward compatibility while enhancing compression efficiency. Our key innovation lies in effectively leveraging independent-view motion and content information to enhance dependent-view compression. Specifically, to exploit the inter-view motion correlation, we propose a feature-based inter-view motion vector prediction method that conditions dependent-view motion encoding on decoded independent-view motion features, along with an inter-view motion entropy model that learns inter-view motion priors. To exploit the inter-view content correlation, we propose a disparity-free inter-view context prediction module that predicts inter-view contexts from decoded independent-view content features, combined with an inter-view contextual entropy model that captures inter-view context priors. Experimental results show that our proposed LMVC framework outperforms the reference software of the traditional MV-HEVC standard by a large margin, establishing a strong baseline for future research in this field.

Problem

Research questions and friction points this paper is trying to address.

Proposes learned multiview video coding for efficient compression

Leverages inter-view motion and content correlations for compression

Ensures random access and backward compatibility in multiview coding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-based inter-view motion vector prediction

Disparity-free inter-view context prediction module

Inter-view motion and contextual entropy models

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs