🤖 AI Summary
This work addresses the challenge of balancing perceptual quality and signal fidelity in generative video compression by proposing a controllable generative compression framework. Leveraging keyframe structural priors and per-frame control priors, the method reconstructs non-keyframes through a multi-visual-condition-guided generative mechanism that preserves temporal and content consistency while recovering fine details. A novel chroma-distance-guided adaptive keyframe selection algorithm is introduced to dynamically optimize keyframe placement. Experimental results demonstrate that the proposed approach outperforms existing generative video compression methods in both perceptual quality and signal fidelity.
📝 Abstract
Perceptual video compression adopts generative video modeling to improve perceptual realism but frequently sacrifices signal fidelity, diverging from the goal of video compression to faithfully reproduce visual signal. To alleviate the dilemma between perception and fidelity, in this paper we propose Controllable Generative Video Compression (CGVC) paradigm to faithfully generate details guided by multiple visual conditions. Under the paradigm, representative keyframes of the scene are coded and used to provide structural priors for non-keyframe generation. Dense per-frame control prior is additionally coded to better preserve finer structure and semantics of each non-keyframe. Guided by these priors, non-keyframes are reconstructed by controllable video generation model with temporal and content consistency. Furthermore, to accurately recover color information of the video, we develop a color-distance-guided keyframe selection algorithm to adaptively choose keyframes. Experimental results show CGVC outperforms previous perceptual video compression method in terms of both signal fidelity and perceptual quality.