Boosting Neural Video Representation via Online Structural Reparameterization

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing neural video representation (NVR) methods suffer from two fundamental bottlenecks: limited model capacity and high computational overhead, hindering both compression performance and deployment flexibility. To address this, we propose an online structural reparameterization framework featuring a lightweight Enhanced Reparameterizable Block (ERB) and an online fusion strategy. During training, multi-branch convolutions dynamically expand model capacity; at inference, the network is equivalently converted into a single-branch structure, thus preserving expressive power while ensuring efficiency. Our method requires no modification to the backbone architecture and is readily integrable into existing pipelines. Evaluated on mainstream video datasets, it achieves PSNR gains of 0.37–2.7 dB over baselines, with comparable training time and decoding speed. This approach effectively breaks the conventional capacity–efficiency trade-off frontier in NVR.

Technology Category

Application Category

📝 Abstract

Neural Video Representation~(NVR) is a promising paradigm for video compression, showing great potential in improving video storage and transmission efficiency. While recent advances have made efforts in architectural refinements to improve representational capability, these methods typically involve complex designs, which may incur increased computational overhead and lack the flexibility to integrate into other frameworks. Moreover, the inherent limitation in model capacity restricts the expressiveness of NVR networks, resulting in a performance bottleneck. To overcome these limitations, we propose Online-RepNeRV, a NVR framework based on online structural reparameterization. Specifically, we propose a universal reparameterization block named ERB, which incorporates multiple parallel convolutional paths to enhance the model capacity. To mitigate the overhead, an online reparameterization strategy is adopted to dynamically fuse the parameters during training, and the multi-branch structure is equivalently converted into a single-branch structure after training. As a result, the additional computational and parameter complexity is confined to the encoding stage, without affecting the decoding efficiency. Extensive experiments on mainstream video datasets demonstrate that our method achieves an average PSNR gain of 0.37-2.7 dB over baseline methods, while maintaining comparable training time and decoding speed.

Problem

Research questions and friction points this paper is trying to address.

Enhancing neural video representation capacity for compression efficiency

Reducing computational complexity in video compression frameworks

Overcoming model expressiveness limitations in neural video networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online structural reparameterization enhances neural video representation

Multi-path convolutional block increases model capacity during training

Dynamic parameter fusion reduces decoding overhead post-training

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs