Neural Video Compression with Context Modulation

๐Ÿ“… 2025-05-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing neural video coders (NVCs) struggle to fully model temporal redundancy, resulting in suboptimal utilization of reference frame information. To address this, we propose a reference-frame-driven contextual modulation mechanism that jointly optimizes prediction through optical-flow-guided temporal context generation and dynamic feature propagation modulation, complemented by a decoupled loss for redundancy suppression. Our key contributions are: (i) the first optical-flow-guided paradigm for dynamic contextual modulation, enabling high-fidelity temporal modeling; and (ii) a decoupled supervision strategy that enhances feature representation purity. Experiments demonstrate that our method achieves an average bitrate reduction of 22.7% over H.266/VVC and further reduces bitrate by 10.1% relative to the state-of-the-art neural codec DCVC-FM on standard test sets, significantly advancing the performance frontier of neural video compression.

Technology Category

Application Category

๐Ÿ“ Abstract
Efficient video coding is highly dependent on exploiting the temporal redundancy, which is usually achieved by extracting and leveraging the temporal context in the emerging conditional coding-based neural video codec (NVC). Although the latest NVC has achieved remarkable progress in improving the compression performance, the inherent temporal context propagation mechanism lacks the ability to sufficiently leverage the reference information, limiting further improvement. In this paper, we address the limitation by modulating the temporal context with the reference frame in two steps. Specifically, we first propose the flow orientation to mine the inter-correlation between the reference frame and prediction frame for generating the additional oriented temporal context. Moreover, we introduce the context compensation to leverage the oriented context to modulate the propagated temporal context generated from the propagated reference feature. Through the synergy mechanism and decoupling loss supervision, the irrelevant propagated information can be effectively eliminated to ensure better context modeling. Experimental results demonstrate that our codec achieves on average 22.7% bitrate reduction over the advanced traditional video codec H.266/VVC, and offers an average 10.1% bitrate saving over the previous state-of-the-art NVC DCVC-FM. The code is available at https://github.com/Austin4USTC/DCMVC.
Problem

Research questions and friction points this paper is trying to address.

Enhancing temporal context utilization in neural video compression
Improving inter-correlation mining between reference and prediction frames
Reducing bitrate via advanced context modulation and compensation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modulates temporal context with reference frame
Uses flow orientation for inter-correlation mining
Introduces context compensation for better modeling
๐Ÿ”Ž Similar Papers
No similar papers found.
Chuanbo Tang
Chuanbo Tang
University of Science and Technology of China
video compression๏ผŒ image compression
Zhuoyuan Li
Zhuoyuan Li
University of Science and Technology of China (USTC)
Video CodingInter/Intra PredictionIn-Loop FilteringLearned Compression
Yifan Bian
Yifan Bian
University of Science & Technology of China
Deep learningend-to-end based image/video compression
L
Li Li
MOE Key Laboratory of Brain-Inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230027, China
D
Dong Liu
MOE Key Laboratory of Brain-Inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230027, China