🤖 AI Summary
To address the poor generalization of machine learning-based video coding on user-generated content (UGC) videos—characterized by high spatiotemporal variability—this paper proposes a novel triple-dynamic preprocessing framework. It introduces, for the first time, a synergistic and tunable adaptive mechanism comprising dynamic preprocessing intensity, dynamic quantization level, and dynamic λ-based rate-distortion trade-off. Leveraging a differentiable encoder simulator, adaptive factor modulation, and a dynamically adjusted λ loss function, the framework enables end-to-end joint rate-distortion optimization. Evaluated on a large-scale real-world UGC video benchmark, the method achieves an average BD-rate reduction of 12.6% over state-of-the-art approaches, demonstrating superior compression efficiency while preserving visual quality—thereby striking a more favorable balance between rate and distortion.
📝 Abstract
In recent years, user generated content (UGC) has become the dominant force in internet traffic. However, UGC videos exhibit a higher degree of variability and diverse characteristics compared to traditional encoding test videos. This variance challenges the effectiveness of data-driven machine learning algorithms for optimizing encoding in the broader context of UGC scenarios. To address this issue, we propose a Tri-Dynamic Preprocessing framework for UGC. Firstly, we employ an adaptive factor to regulate preprocessing intensity. Secondly, an adaptive quantization level is employed to fine-tune the codec simulator. Thirdly, we utilize an adaptive lambda tradeoff to adjust the rate-distortion loss function. Experimental results on large-scale test sets demonstrate that our method attains exceptional performance.