Neural B-Frame Coding: Tackling Domain Shift Issues with Lightweight Online Motion Resolution Adaptation

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address domain shift in neural B-frame coding caused by inconsistent GOP lengths between training and inference—leading to inaccurate motion estimation under large motions—this paper proposes a lightweight online motion resolution adaptation method. The approach dynamically predicts the optimal downsampling factor using state signals from the current and reference frames, via three plug-and-play lightweight classifiers, including a Co-Class variant balancing accuracy and efficiency, without requiring retraining. A focal loss is adopted to optimize the binary classifier, while soft labels derived from rate-distortion cost train the multi-class classifier; selective search further accelerates decision-making. Experiments demonstrate that the method achieves near-exhaustive-search coding performance while significantly reducing computational complexity, enabling seamless integration into existing B-frame coding frameworks.

Technology Category

Application Category

📝 Abstract

Learned B-frame codecs with hierarchical temporal prediction often encounter the domain-shift issue due to mismatches between the Group-of-Pictures (GOP) sizes for training and testing, leading to inaccurate motion estimates, particularly for large motion. A common solution is to turn large motion into small motion by downsampling video frames during motion estimation. However, determining the optimal downsampling factor typically requires costly rate-distortion optimization. This work introduces lightweight classifiers to predict downsampling factors. These classifiers leverage simple state signals from current and reference frames to balance rate-distortion performance with computational cost. Three variants are proposed: (1) a binary classifier (Bi-Class) trained with Focal Loss to choose between high and low resolutions, (2) a multi-class classifier (Mu-Class) trained with novel soft labels based on rate-distortion costs, and (3) a co-class approach (Co-Class) that combines the predictive capability of the multi-class classifier with the selective search of the binary classifier. All classifier methods can work seamlessly with existing B-frame codecs without requiring codec retraining. Experimental results show that they achieve coding performance comparable to exhaustive search methods while significantly reducing computational complexity. The code is available at: https://github.com/NYCU-MAPL/Fast-OMRA.git.

Problem

Research questions and friction points this paper is trying to address.

Addressing domain shift in B-frame coding from training-testing GOP size mismatches

Reducing computational cost of determining optimal downsampling factors for motion estimation

Maintaining coding performance while enabling lightweight online motion resolution adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight classifiers predict video downsampling factors

Three classifier variants balance rate-distortion and computation

Works with existing B-frame codecs without retraining

🔎 Similar Papers

Generalizable Implicit Motion Modeling for Video Frame Interpolation