Boosting Neural Video Representation via Online Structural Reparameterization

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural video representation (NVR) methods suffer from two fundamental bottlenecks: limited model capacity and high computational overhead, hindering both compression performance and deployment flexibility. To address this, we propose an online structural reparameterization framework featuring a lightweight Enhanced Reparameterizable Block (ERB) and an online fusion strategy. During training, multi-branch convolutions dynamically expand model capacity; at inference, the network is equivalently converted into a single-branch structure, thus preserving expressive power while ensuring efficiency. Our method requires no modification to the backbone architecture and is readily integrable into existing pipelines. Evaluated on mainstream video datasets, it achieves PSNR gains of 0.37–2.7 dB over baselines, with comparable training time and decoding speed. This approach effectively breaks the conventional capacity–efficiency trade-off frontier in NVR.

Technology Category

Application Category

📝 Abstract
Neural Video Representation~(NVR) is a promising paradigm for video compression, showing great potential in improving video storage and transmission efficiency. While recent advances have made efforts in architectural refinements to improve representational capability, these methods typically involve complex designs, which may incur increased computational overhead and lack the flexibility to integrate into other frameworks. Moreover, the inherent limitation in model capacity restricts the expressiveness of NVR networks, resulting in a performance bottleneck. To overcome these limitations, we propose Online-RepNeRV, a NVR framework based on online structural reparameterization. Specifically, we propose a universal reparameterization block named ERB, which incorporates multiple parallel convolutional paths to enhance the model capacity. To mitigate the overhead, an online reparameterization strategy is adopted to dynamically fuse the parameters during training, and the multi-branch structure is equivalently converted into a single-branch structure after training. As a result, the additional computational and parameter complexity is confined to the encoding stage, without affecting the decoding efficiency. Extensive experiments on mainstream video datasets demonstrate that our method achieves an average PSNR gain of 0.37-2.7 dB over baseline methods, while maintaining comparable training time and decoding speed.
Problem

Research questions and friction points this paper is trying to address.

Enhancing neural video representation capacity for compression efficiency
Reducing computational complexity in video compression frameworks
Overcoming model expressiveness limitations in neural video networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online structural reparameterization enhances neural video representation
Multi-path convolutional block increases model capacity during training
Dynamic parameter fusion reduces decoding overhead post-training
🔎 Similar Papers
No similar papers found.
Ziyi Li
Ziyi Li
Assistant Professor, MD Anderson Cancer Center
Biostatistics methodsBioinformatics
Q
Qingyu Mao
College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
S
Shuai Liu
College of Applied Technology, Shenzhen University, Shenzhen, China
Qilei Li
Qilei Li
Central China Normal University
Deep learningComputer science
F
Fanyang Meng
Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, China
Yongsheng Liang
Yongsheng Liang
Harbin Institute of Technology
Image ProcessingSource CodingChannel Coding