🤖 AI Summary
Addressing the longstanding trade-off between coding efficiency and computational complexity in Versatile Video Coding (VVC), this paper proposes an Early Termination strategy based on Reference-frame CTU partition maps (ETRF). ETRF is the first method to leverage CTU-level partition information from lower temporal layer reference frames to guide coding decisions for higher temporal layer frames, enabling cross-temporal-layer acceleration. Integrated into the VVC framework, ETRF combines CTU-level partition mapping, reference frame reuse, and rate-distortion optimization pruning, introducing a new high-efficiency preset between the Medium and Fast presets. Experimental results demonstrate that, compared to the Medium preset, ETRF reduces average encoding time by 21% while maintaining comparable compression efficiency. On videos with high spatiotemporal complexity, ETRF significantly improves the rate–time trade-off over the Fast preset, with only marginal BD-rate increases (≤0.3%), thereby validating its effectiveness in achieving substantial computational savings without compromising coding performance.
📝 Abstract
In response to the growing demand for high-quality videos, Versatile Video Coding (VVC) was released in 2020, building on the hybrid coding architecture of its predecessor, HEVC, achieving about 50% bitrate reduction for the same visual quality. It introduces more flexible block partitioning, enhancing compression efficiency at the cost of increased encoding complexity. To make efficient use of VVC in practical applications, optimization is essential. VVenC, an optimized open-source VVC encoder, introduces multiple presets to address the trade-off between compression efficiency and encoder complexity. Although an optimized set of encoding tools has been selected for each preset, the rate-distortion (RD) search space in the encoder presets still poses a challenge for efficient encoder implementations. In this paper, we propose Early Termination using Reference Frames (ETRF), which improves the trade-off between encoding efficiency and time complexity and positions itself as a new preset between medium and fast presets. The CTU partitioning map of the reference frames in lower temporal layers is employed to accelerate the encoding of frames in higher temporal layers. The results show a reduction in the encoding time of around 21% compared to the medium preset. Specifically, for videos with high spatial and temporal complexities, which typically require longer encoding times, the proposed method achieves a better trade-off between bitrate savings and encoding time compared to the fast preset.