L-LBVC: Long-Term Motion Estimation and Prediction for Learned Bi-Directional Video Compression

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Learning-based bidirectional video compression (LBVC) suffers significant performance degradation over conventional bidirectional coding—especially in long-term motion estimation and prediction under large-motion scenarios. To address this, we propose an adaptive motion estimation and prediction framework. Our key contributions are: (1) a novel recursive local optical flow accumulation mechanism that enables robust long-term motion modeling; and (2) a test-time adaptive reference frame downsampling strategy that dynamically aligns with the motion range observed during training, thereby reducing motion coding bit overhead. The method integrates optical flow estimation, recursive flow accumulation, adaptive downsampling, and an end-to-end deep video compression architecture. Experimental results under random-access configurations demonstrate that our approach substantially outperforms existing learning-based methods and achieves superior rate-distortion performance over VVC (VTM) on multiple standard benchmark datasets.

Technology Category

Application Category

📝 Abstract
Recently, learned video compression (LVC) has shown superior performance under low-delay configuration. However, the performance of learned bi-directional video compression (LBVC) still lags behind traditional bi-directional coding. The performance gap mainly arises from inaccurate long-term motion estimation and prediction of distant frames, especially in large motion scenes. To solve these two critical problems, this paper proposes a novel LBVC framework, namely L-LBVC. Firstly, we propose an adaptive motion estimation module that can handle both short-term and long-term motions. Specifically, we directly estimate the optical flows for adjacent frames and non-adjacent frames with small motions. For non-adjacent frames with large motions, we recursively accumulate local flows between adjacent frames to estimate long-term flows. Secondly, we propose an adaptive motion prediction module that can largely reduce the bit cost for motion coding. To improve the accuracy of long-term motion prediction, we adaptively downsample reference frames during testing to match the motion ranges observed during training. Experiments show that our L-LBVC significantly outperforms previous state-of-the-art LVC methods and even surpasses VVC (VTM) on some test datasets under random access configuration.
Problem

Research questions and friction points this paper is trying to address.

Improving long-term motion estimation in bi-directional video compression
Enhancing motion prediction accuracy for distant frames
Reducing bit cost for motion coding in large motion scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive motion estimation for short and long-term flows
Recursive local flow accumulation for large motions
Adaptive motion prediction with reference frame downsampling
🔎 Similar Papers
No similar papers found.
Y
Yongqi Zhai
1Peking University Shenzhen Graduate School, China; 2Pengcheng Laboratory, China
L
Luyang Tang
1Peking University Shenzhen Graduate School, China; 2Pengcheng Laboratory, China
W
Wei Jiang
1Peking University Shenzhen Graduate School, China; 2Pengcheng Laboratory, China
Jiayu Yang
Jiayu Yang
The Australian National University
3D Computer Vision3D AIGC3D ReconstructionMulti-view StereoVR AR XR
Ronggang Wang
Ronggang Wang
Shenzhen Graduate School, Peking University
Immersive Video Coding and Processing