CWRNN-INVR: A Coupled WarpRNN based Implicit Neural Video Representation

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing implicit neural video representations in effectively leveraging the complementary strengths of neural networks and learnable grids for modeling structured content versus fine-grained details. The authors propose a hybrid framework that, for the first time, explicitly decouples video information into regular and irregular components from a representational perspective. Specifically, they introduce a coupled WarpRNN module to explicitly model structured elements such as motion and geometry, while employing a hybrid residual grid to jointly represent irregular appearance and motion details. This synergistic architecture enables multi-scale motion compensation and efficient implicit representation. Evaluated on the UVG dataset, the method achieves an average PSNR of 33.73 dB with only 3M parameters, outperforming current approaches in reconstruction quality and demonstrating superior generalization across multiple downstream tasks.
📝 Abstract
Implicit Neural Video Representation (INVR) has emerged as a novel approach for video representation and compression, using learnable grids and neural networks. Existing methods focus on developing new grid structures efficient for latent representation and neural network architectures with large representation capability, lacking the study on their roles in video representation. In this paper, the difference between INVR based on neural network and INVR based on grid is first investigated from the perspective of video information composition to specify their own advantages, i.e., neural network for general structure while grid for specific detail. Accordingly, an INVR based on mixed neural network and residual grid framework is proposed, where the neural network is used to represent the regular and structured information and the residual grid is used to represent the remaining irregular information in a video. A Coupled WarpRNN-based multi-scale motion representation and compensation module is specifically designed to explicitly represent the regular and structured information, thus terming our method as CWRNN-INVR. For the irregular information, a mixed residual grid is learned where the irregular appearance and motion information are represented together. The mixed residual grid can be combined with the coupled WarpRNN in a way that allows for network reuse. Experiments show that our method achieves the best reconstruction results compared with the existing methods, with an average PSNR of 33.73 dB on the UVG dataset under the 3M model and outperforms existing INVR methods in other downstream tasks. The code can be found at https://github.com/yiyang-sdu/CWRNN-INVR.git}{https://github.com/yiyang-sdu/CWRNN-INVR.git.
Problem

Research questions and friction points this paper is trying to address.

Implicit Neural Video Representation
video representation
neural network
learnable grid
motion compensation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit Neural Video Representation
Coupled WarpRNN
Residual Grid
Multi-scale Motion Compensation
Hybrid Representation
🔎 Similar Papers
No similar papers found.
Yiyang Li
Yiyang Li
University of Michigan
Yanbo Gao
Yanbo Gao
Shandong University
Video Coding3D Video ProcessingDeep Learning
Shuai Li
Shuai Li
Shandong University
IndRNNimage/video coding3D video processingcomputer visiondeep learning
Z
Zhenyu Du
School of Control Science and Engineering, Shandong University, and Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250100, China
J
Jinglin Zhang
School of Control Science and Engineering, Shandong University, and Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250100, China
H
Hui Yuan
School of Control Science and Engineering, Shandong University, and Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250100, China
M
Mao Ye
University of Electronic Science and Technology of China, Sichuan, China
Xingyu Gao
Xingyu Gao
Professor of Computer Science, Chinese Academy of Sciences
Machine LearningComputer VisionMultimediaUbiquitous Computing