Neural Video Compression with Context Modulation

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing neural video coders (NVCs) struggle to fully model temporal redundancy, resulting in suboptimal utilization of reference frame information. To address this, we propose a reference-frame-driven contextual modulation mechanism that jointly optimizes prediction through optical-flow-guided temporal context generation and dynamic feature propagation modulation, complemented by a decoupled loss for redundancy suppression. Our key contributions are: (i) the first optical-flow-guided paradigm for dynamic contextual modulation, enabling high-fidelity temporal modeling; and (ii) a decoupled supervision strategy that enhances feature representation purity. Experiments demonstrate that our method achieves an average bitrate reduction of 22.7% over H.266/VVC and further reduces bitrate by 10.1% relative to the state-of-the-art neural codec DCVC-FM on standard test sets, significantly advancing the performance frontier of neural video compression.

Technology Category

Application Category

📝 Abstract

Efficient video coding is highly dependent on exploiting the temporal redundancy, which is usually achieved by extracting and leveraging the temporal context in the emerging conditional coding-based neural video codec (NVC). Although the latest NVC has achieved remarkable progress in improving the compression performance, the inherent temporal context propagation mechanism lacks the ability to sufficiently leverage the reference information, limiting further improvement. In this paper, we address the limitation by modulating the temporal context with the reference frame in two steps. Specifically, we first propose the flow orientation to mine the inter-correlation between the reference frame and prediction frame for generating the additional oriented temporal context. Moreover, we introduce the context compensation to leverage the oriented context to modulate the propagated temporal context generated from the propagated reference feature. Through the synergy mechanism and decoupling loss supervision, the irrelevant propagated information can be effectively eliminated to ensure better context modeling. Experimental results demonstrate that our codec achieves on average 22.7% bitrate reduction over the advanced traditional video codec H.266/VVC, and offers an average 10.1% bitrate saving over the previous state-of-the-art NVC DCVC-FM. The code is available at https://github.com/Austin4USTC/DCMVC.

Problem

Research questions and friction points this paper is trying to address.

Enhancing temporal context utilization in neural video compression

Improving inter-correlation mining between reference and prediction frames

Reducing bitrate via advanced context modulation and compensation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modulates temporal context with reference frame

Uses flow orientation for inter-correlation mining

Introduces context compensation for better modeling

🔎 Similar Papers

When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding