๐ค AI Summary
Existing neural video coders (NVCs) struggle to fully model temporal redundancy, resulting in suboptimal utilization of reference frame information. To address this, we propose a reference-frame-driven contextual modulation mechanism that jointly optimizes prediction through optical-flow-guided temporal context generation and dynamic feature propagation modulation, complemented by a decoupled loss for redundancy suppression. Our key contributions are: (i) the first optical-flow-guided paradigm for dynamic contextual modulation, enabling high-fidelity temporal modeling; and (ii) a decoupled supervision strategy that enhances feature representation purity. Experiments demonstrate that our method achieves an average bitrate reduction of 22.7% over H.266/VVC and further reduces bitrate by 10.1% relative to the state-of-the-art neural codec DCVC-FM on standard test sets, significantly advancing the performance frontier of neural video compression.
๐ Abstract
Efficient video coding is highly dependent on exploiting the temporal redundancy, which is usually achieved by extracting and leveraging the temporal context in the emerging conditional coding-based neural video codec (NVC). Although the latest NVC has achieved remarkable progress in improving the compression performance, the inherent temporal context propagation mechanism lacks the ability to sufficiently leverage the reference information, limiting further improvement. In this paper, we address the limitation by modulating the temporal context with the reference frame in two steps. Specifically, we first propose the flow orientation to mine the inter-correlation between the reference frame and prediction frame for generating the additional oriented temporal context. Moreover, we introduce the context compensation to leverage the oriented context to modulate the propagated temporal context generated from the propagated reference feature. Through the synergy mechanism and decoupling loss supervision, the irrelevant propagated information can be effectively eliminated to ensure better context modeling. Experimental results demonstrate that our codec achieves on average 22.7% bitrate reduction over the advanced traditional video codec H.266/VVC, and offers an average 10.1% bitrate saving over the previous state-of-the-art NVC DCVC-FM. The code is available at https://github.com/Austin4USTC/DCMVC.