🤖 AI Summary
To address the suboptimal rate-distortion (RD) performance and limited contextual modeling capability of variable-bitrate neural video codecs (V-NVCs) at high bitrates, this paper proposes the EV-NVC framework. Methodologically, EV-NVC introduces three key innovations: (1) a piecewise-linear sampler (PLS) enabling fine-grained, differentiable bitrate adaptation; (2) a long-short-term feature fusion module (LSTFFM) to enhance spatiotemporal contextual modeling; and (3) a hybrid-precision, stage-wise collaborative training strategy to improve convergence stability and reconstruction fidelity under high compression. Experimental results under low-delay configurations demonstrate that EV-NVC achieves an average 30.56% BD-rate reduction over HM-16.25, significantly improving both reconstruction quality and RD performance—particularly in the high-bitrate regime.
📝 Abstract
Training neural video codec (NVC) with variable rate is a highly challenging task due to its complex training strategies and model structure. In this paper, we train an efficient variable bitrate neural video codec (EV-NVC) with the piecewise linear sampler (PLS) to improve the rate-distortion performance in high bitrate range, and the long-short-term feature fusion module (LSTFFM) to enhance the context modeling. Besides, we introduce mixed-precision training and discuss the different training strategies for each stage in detail to fully evaluate its effectiveness. Experimental results show that our approach reduces the BD-rate by 30.56% compared to HM-16.25 within low-delay mode.