How to Design and Train Your Implicit Neural Representation for Video Compression

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Implicit Neural Representations (INRs) for video compression suffer from low coding efficiency due to frame-wise independent training, hindering practical deployment. To address this, we systematically decouple the NeRV architecture and propose RNeRV—a lightweight, efficient model incorporating hypernetwork-based weight prediction and learnable weight masking. These innovations enable single-pass training with adaptive bit-rate control. Evaluated on seven 1080p videos, RNeRV achieves a 1.27% average PSNR gain with only a 0.4% parameter increase. On UCF-101 at 0.037 bpp, it improves PSNR and MS-SSIM by 1.7% and 2.5%/2.7%, respectively. This work significantly advances INR-based video compression toward real-time, deployable encoding.

Technology Category

Application Category

📝 Abstract

Implicit neural representation (INR) methods for video compression have recently achieved visual quality and compression ratios that are competitive with traditional pipelines. However, due to the need for per-sample network training, the encoding speeds of these methods are too slow for practical adoption. We develop a library to allow us to disentangle and review the components of methods from the NeRV family, reframing their performance in terms of not only size-quality trade-offs, but also impacts on training time. We uncover principles for effective video INR design and propose a state-of-the-art configuration of these components, Rabbit NeRV (RNeRV). When all methods are given equal training time (equivalent to 300 NeRV epochs) for 7 different UVG videos at 1080p, RNeRV achieves +1.27% PSNR on average compared to the best-performing alternative for each video in our NeRV library. We then tackle the encoding speed issue head-on by investigating the viability of hyper-networks, which predict INR weights from video inputs, to disentangle training from encoding to allow for real-time encoding. We propose masking the weights of the predicted INR during training to allow for variable, higher quality compression, resulting in 1.7% improvements to both PSNR and MS-SSIM at 0.037 bpp on the UCF-101 dataset, and we increase hyper-network parameters by 0.4% for 2.5%/2.7% improvements to PSNR/MS-SSIM with equal bpp and similar speeds. Our project website is available at https://mgwillia.github.io/vinrb/ and our code is available at https://github.com/mgwillia/vinrb.

Problem

Research questions and friction points this paper is trying to address.

Improve encoding speed of implicit neural video compression

Optimize design principles for video INR methods

Enable real-time encoding using hyper-networks for INRs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops Rabbit NeRV for video compression

Uses hyper-networks for real-time encoding

Improves PSNR and MS-SSIM with masking

🔎 Similar Papers

No similar papers found.