🤖 AI Summary
Learning-based bidirectional video compression (LBVC) suffers significant performance degradation over conventional bidirectional coding—especially in long-term motion estimation and prediction under large-motion scenarios. To address this, we propose an adaptive motion estimation and prediction framework. Our key contributions are: (1) a novel recursive local optical flow accumulation mechanism that enables robust long-term motion modeling; and (2) a test-time adaptive reference frame downsampling strategy that dynamically aligns with the motion range observed during training, thereby reducing motion coding bit overhead. The method integrates optical flow estimation, recursive flow accumulation, adaptive downsampling, and an end-to-end deep video compression architecture. Experimental results under random-access configurations demonstrate that our approach substantially outperforms existing learning-based methods and achieves superior rate-distortion performance over VVC (VTM) on multiple standard benchmark datasets.
📝 Abstract
Recently, learned video compression (LVC) has shown superior performance under low-delay configuration. However, the performance of learned bi-directional video compression (LBVC) still lags behind traditional bi-directional coding. The performance gap mainly arises from inaccurate long-term motion estimation and prediction of distant frames, especially in large motion scenes. To solve these two critical problems, this paper proposes a novel LBVC framework, namely L-LBVC. Firstly, we propose an adaptive motion estimation module that can handle both short-term and long-term motions. Specifically, we directly estimate the optical flows for adjacent frames and non-adjacent frames with small motions. For non-adjacent frames with large motions, we recursively accumulate local flows between adjacent frames to estimate long-term flows. Secondly, we propose an adaptive motion prediction module that can largely reduce the bit cost for motion coding. To improve the accuracy of long-term motion prediction, we adaptively downsample reference frames during testing to match the motion ranges observed during training. Experiments show that our L-LBVC significantly outperforms previous state-of-the-art LVC methods and even surpasses VVC (VTM) on some test datasets under random access configuration.