DQ-Ladder: A Deep Reinforcement Learning-based Bitrate Ladder for Adaptive Video Streaming

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the limitations of conventional fixed bitrate ladders, which disregard variations in video content and decoding complexity, thereby failing to balance coding efficiency, decoding overhead, and visual quality. To overcome this, the study introduces, for the first time, a deep reinforcement learning approach to adaptive bitrate ladder optimization. Specifically, a Deep Q-Network-based method is proposed that leverages machine learning models to predict per-segment decoding time, bitrate, and perceptual quality metrics (VMAF/XPSNR), dynamically generating ladders through a weighted reward function. The approach significantly reduces sensitivity to prediction errors, achieving at least a 10.3% BD-rate gain in XPSNR over standard HLS ladders within the VVC codec framework, while reducing decoding time by 22%. Moreover, it maintains robust performance even under 20% prediction noise.

Technology Category

Application Category

📝 Abstract

Adaptive streaming of segmented video over HTTP typically relies on a predefined set of bitrate-resolution pairs, known as a bitrate ladder. However, fixed ladders often overlook variations in content and decoding complexities, leading to suboptimal trade-offs between encoding time, decoding efficiency, and video quality. This article introduces DQ-Ladder, a deep reinforcement learning (DRL)-based scheme for constructing time- and quality-aware bitrate ladders for adaptive video streaming applications. DQ-Ladder employs predicted decoding time, quality scores, and bitrate levels per segment as inputs to a Deep Q-Network (DQN) agent, guided by a weighted reward function of decoding time, video quality, and resolution smoothness. We leverage machine learning models to predict decoding time, bitrate level, and objective quality metrics (VMAF, XPSNR), eliminating the need for exhaustive encoding or quality metric computation. We evaluate DQ-Ladder using the Versatile Video Coding (VVC) toolchain (VVenC/VVdeC) on 750 video sequences across six Apple HLS-compliant resolutions and 41 quantization parameters. Experimental results against four baselines show that DQ-Ladder achieves BD-rate reductions of at least 10.3% for XPSNR compared to the HLS ladder, while reducing decoding time by 22%. DQ-Ladder shows significantly lower sensitivity to prediction errors than competing methods, remaining robust even with up to 20% noise.

Problem

Research questions and friction points this paper is trying to address.

bitrate ladder

adaptive video streaming

decoding complexity

video quality

content variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning

Bitrate Ladder

Adaptive Video Streaming