LRConv-NeRV: Low Rank Convolution for Efficient Neural Video Compression

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the high computational and memory overhead of NeRV’s convolutional decoder, which hinders deployment on resource-constrained devices. The authors propose replacing selected dense 3×3 convolutions in the NeRV decoder with structured low-rank separable convolutions (LRConv) and introduce a progressive low-rank decomposition strategy that incrementally integrates LRConv from the decoder’s backend toward its frontend, enabling a controllable trade-off between efficiency and reconstruction quality. Applying LRConv only in the final decoding stage reduces decoding complexity by 68% (from 201.9 to 64.9 GFLOPs), shrinks model size by 9.3%, and lowers bitrate by approximately 9.2%, while achieving superior performance over existing methods in terms of PSNR, MS-SSIM, and temporal stability as measured by LPIPS.

Technology Category

Application Category

📝 Abstract

Neural Representations for Videos (NeRV) encode entire video sequences within neural network parameters, offering an alternative paradigm to conventional video codecs. However, the convolutional decoder of NeRV remains computationally expensive and memory intensive, limiting its deployment in resource-constrained environments. This paper proposes LRConv-NeRV, an efficient NeRV variant that replaces selected dense 3x3 convolutional layers with structured low-rank separable convolutions, trained end-to-end within the decoder architecture. By progressively applying low-rank factorization from the largest to earlier decoder stages, LRConv-NeRV enables controllable trade-offs between reconstruction quality and efficiency. Extensive experiments demonstrate that applying LRConv only to the final decoder stage reduces decoder complexity by 68%, from 201.9 to 64.9 GFLOPs, and model size by 9.3%, while incurring negligible quality loss and achieving approximately 9.2% bitrate reduction. Under INT8 post-training quantization, LRConv-NeRV preserves reconstruction quality close to the dense NeRV baseline, whereas more aggressive factorization of early decoder stages leads to disproportionate quality degradation. Compared to existing work under layer-aligned settings, LRConv-NeRV achieves a more favorable efficiency versus quality trade-off, offering substantial GFLOPs and parameter reductions while maintaining higher PSNR/MS-SSIM and improved temporal stability. Temporal flicker analysis using LPIPS further shows that the proposed solution preserves temporal coherence close to the NeRV baseline, results establish LRConv-NeRV as a potential architectural alternative for efficient neural video decoding under low-precision and resource-constrained settings.

Problem

Research questions and friction points this paper is trying to address.

Neural Video Compression

Computational Efficiency

Memory Intensive

Resource-Constrained Deployment

NeRV

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-Rank Convolution

Neural Video Compression

Efficient Neural Representation