🤖 AI Summary
Existing lightweight video compression methods based on implicit neural representations (INRs) suffer from slow encoding speed and limited performance due to autoregressive entropy coding. To address this, we propose an ultra-lightweight neural video representation compression framework. Our key contributions are: (1) a high-resolution multi-scale feature grid that significantly improves reconstruction quality under low computational complexity; and (2) an octree-based context modeling mechanism that replaces conventional autoregressive models, enabling substantially faster entropy coding of high-dimensional feature grids. The framework is trained end-to-end to jointly optimize rate-distortion performance and computational efficiency. Experiments demonstrate that, compared to the state-of-the-art lightweight method C3, our approach achieves BD-rate reductions of 21.03% (PSNR) and 23.06% (MS-SSIM), while accelerating encoding and decoding by 8.4× and 2.5×, respectively.
📝 Abstract
Recent works have demonstrated the viability of utilizing over-fitted implicit neural representations (INRs) as alternatives to autoencoder-based models for neural video compression. Among these INR-based video codecs, Neural Video Representation Compression (NVRC) was the first to adopt a fully end-to-end compression framework that compresses INRs, achieving state-of-the-art performance. Moreover, some recently proposed lightweight INRs have shown comparable performance to their baseline codecs with computational complexity lower than 10kMACs/pixel. In this work, we extend NVRC toward lightweight representations, and propose NVRC-Lite, which incorporates two key changes. Firstly, we integrated multi-scale feature grids into our lightweight neural representation, and the use of higher resolution grids significantly improves the performance of INRs at low complexity. Secondly, we address the issue that existing INRs typically leverage autoregressive models for entropy coding: these are effective but impractical due to their slow coding speed. In this work, we propose an octree-based context model for entropy coding high-dimensional feature grids, which accelerates the entropy coding module of the model. Our experimental results demonstrate that NVRC-Lite outperforms C3, one of the best lightweight INR-based video codecs, with up to 21.03% and 23.06% BD-rate savings when measured in PSNR and MS-SSIM, respectively, while achieving 8.4x encoding and 2.5x decoding speedup. The implementation of NVRC-Lite will be made available.