🤖 AI Summary
To address domain shift in neural B-frame coding caused by inconsistent GOP lengths between training and inference—leading to inaccurate motion estimation under large motions—this paper proposes a lightweight online motion resolution adaptation method. The approach dynamically predicts the optimal downsampling factor using state signals from the current and reference frames, via three plug-and-play lightweight classifiers, including a Co-Class variant balancing accuracy and efficiency, without requiring retraining. A focal loss is adopted to optimize the binary classifier, while soft labels derived from rate-distortion cost train the multi-class classifier; selective search further accelerates decision-making. Experiments demonstrate that the method achieves near-exhaustive-search coding performance while significantly reducing computational complexity, enabling seamless integration into existing B-frame coding frameworks.
📝 Abstract
Learned B-frame codecs with hierarchical temporal prediction often encounter the domain-shift issue due to mismatches between the Group-of-Pictures (GOP) sizes for training and testing, leading to inaccurate motion estimates, particularly for large motion. A common solution is to turn large motion into small motion by downsampling video frames during motion estimation. However, determining the optimal downsampling factor typically requires costly rate-distortion optimization. This work introduces lightweight classifiers to predict downsampling factors. These classifiers leverage simple state signals from current and reference frames to balance rate-distortion performance with computational cost. Three variants are proposed: (1) a binary classifier (Bi-Class) trained with Focal Loss to choose between high and low resolutions, (2) a multi-class classifier (Mu-Class) trained with novel soft labels based on rate-distortion costs, and (3) a co-class approach (Co-Class) that combines the predictive capability of the multi-class classifier with the selective search of the binary classifier. All classifier methods can work seamlessly with existing B-frame codecs without requiring codec retraining. Experimental results show that they achieve coding performance comparable to exhaustive search methods while significantly reducing computational complexity. The code is available at: https://github.com/NYCU-MAPL/Fast-OMRA.git.