Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion

πŸ“… 2026-05-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

225K/year
πŸ€– AI Summary
This work addresses the challenge of achieving high-quality video compression at extremely low bitrates by efficiently guiding the generative process with compact conditional signals. The proposed ActDiff-VC framework innovatively integrates content-adaptive keyframe active sampling, budget-aware sparse point trajectory modeling, and conditional diffusion-based generation, enabling the synthesis of high-fidelity non-keyframes from only a few transmitted keyframes and sparse trajectories. Evaluated on the UVG and MCL-JCV datasets, the method significantly outperforms existing learning-based and diffusion-based baselines, reducing bitrate by 64.6% at the same NIQE score while simultaneously achieving a 64.6% reduction in KID and a 37.7% improvement in FID, thereby jointly enhancing perceptual quality and compression efficiency.
πŸ“ Abstract
Diffusion models provide a powerful generative prior for perceptual reconstruction at ultra-low bitrates, but effective video compression requires controlling the generative process using highly compact conditioning signals. In this work, we present ActDiff-VC, a diffusion-based video compression framework for the ultra-low-bitrate regime. Our method partitions videos into variable-length segments, transmits keyframes only when needed, and summarizes temporal dynamics using a compact set of tracked point trajectories. Conditioned on these sparse signals, a conditional diffusion decoder synthesizes the remaining frames, enabling perceptually realistic reconstruction under severe rate constraints. To support this design, we introduce two mechanisms: content-adaptive keyframe selection and budget-aware sparse trajectory selection, which together enable compact yet effective conditioning for generative reconstruction. Experiments on the UVG and MCL-JCV benchmarks show that ActDiff-VC achieves up to 64.6\% bitrate reduction at matched NIQE, improves KID by up to 64.6\% and FID by up to 37.7\% at comparable bitrates against strong learned codecs, and delivers favorable perceptual rate--distortion trade-offs relative to learned and diffusion-based baselines in the ultra-low-bitrate regime.
Problem

Research questions and friction points this paper is trying to address.

ultra-low-bitrate
video compression
diffusion models
conditional generation
perceptual reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional diffusion
ultra-low-bitrate compression
keyframe selection
sparse trajectory
generative video compression