🤖 AI Summary
This work addresses the challenge of effectively modeling complex temporal dependencies in existing implicit neural representation (INR)-based video compression methods. To this end, we propose TeNeRV, a hierarchical temporal neural representation framework that enhances local temporal consistency through an inter-frame feature fusion module and jointly captures both short- and long-term dependencies via a Group-of-Pictures (GoP)-adaptive modulation mechanism. TeNeRV implicitly models video content using continuous functions and dynamically adjusts its neural representation parameters according to the GoP structure. Experimental results demonstrate that TeNeRV achieves significantly superior rate-distortion performance compared to current INR-based video compression approaches.
📝 Abstract
Video compression has recently benefited from implicit neural representations (INRs), which model videos as continuous functions. INRs offer compact storage and flexible reconstruction, providing a promising alternative to traditional codecs. However, most existing INR-based methods treat the temporal dimension as an independent input, limiting their ability to capture complex temporal dependencies. To address this, we propose a Hierarchical Temporal Neural Representation for Videos, TeNeRV. TeNeRV integrates short- and long-term dependencies through two key components. First, an Inter-Frame Feature Fusion (IFF) module aggregates features from adjacent frames, enforcing local temporal coherence and capturing fine-grained motion. Second, a GoP-Adaptive Modulation (GAM) mechanism partitions videos into Groups-of-Pictures and learns group-specific priors. The mechanism modulates network parameters, enabling adaptive representations across different GoPs. Extensive experiments demonstrate that TeNeRV consistently outperforms existing INR-based methods in rate-distortion performance, validating the effectiveness of our proposed approach.