🤖 AI Summary
To address the high inter-frame coding complexity induced by QT+MTT block partitioning in Versatile Video Coding (VVC), this paper proposes a fast partitioning method based on partition graphs. The method models partition structures as graphs for the first time in VVC inter prediction, and employs a lightweight neural network that jointly exploits spatiotemporal features and stacked contextual fusion to predict partitions—integrated with quantization parameter modulation and partition-adaptive optical flow warping. Additionally, it introduces an MTT mask early-termination mechanism and a dual-threshold RD-complexity trade-off decision framework. Experimental results under the random-access configuration show an average 51.30% reduction in encoding time, with only a 2.12% BD-BR degradation. The proposed approach significantly enhances the real-time encoding efficiency and practical applicability of VVC.
📝 Abstract
Among the new techniques of Versatile Video Coding (VVC), the quadtree with nested multi-type tree (QT+MTT) block structure yields significant coding gains by providing more flexible block partitioning patterns. However, the recursive partition search in the VVC encoder increases the encoder complexity substantially. To address this issue, we propose a partition map-based algorithm to pursue fast block partitioning in inter coding. Based on our previous work on partition map-based methods for intra coding, we analyze the characteristics of VVC inter coding, and thus improve the partition map by incorporating an MTT mask for early termination. Next, we develop a neural network that uses both spatial and temporal features to predict the partition map. It consists of several special designs including stacked top-down and bottom-up processing, quantization parameter modulation layers, and partitioning-adaptive warping. Furthermore, we present a dual-threshold decision scheme to achieve a fine-grained trade-off between complexity reduction and rate-distortion (RD) performance loss. The experimental results demonstrate that the proposed method achieves an average 51.30% encoding time saving with a 2.12% Bjontegaard Delta Bit Rate (BDBR) under the random access configuration.