🤖 AI Summary
This work systematically evaluates the practical performance and deployment bottlenecks of learning-based congestion control algorithms—particularly reinforcement learning (RL) approaches—in dynamic network environments. Addressing critical gaps in prior work—including lack of reproducible benchmarking, insufficient robustness evaluation, and poor generalization of fairness across network conditions—we establish a unified experimental framework for rigorous comparison among TCP Cubic, BBR, and state-of-the-art learning-based methods. Our key methodological innovation is the explicit incorporation of fairness into the RL reward function. Results reveal that while learning-based algorithms achieve high bandwidth utilization in low-latency settings, their performance degrades significantly under joint bandwidth–delay fluctuations and non-congestion-induced packet loss; moreover, fairness fails to generalize across diverse network topologies. To foster transparency and reproducibility, we fully open-source all code, datasets, and evaluation pipelines, advancing accountable AI-driven networking research.
📝 Abstract
Learning-based congestion control (CC), including Reinforcement-Learning, promises efficient CC in a fast-changing networking landscape, where evolving communication technologies, applications and traffic workloads pose severe challenges to human-derived, static CC algorithms. Learning-based CC is in its early days and substantial research is required to understand existing limitations, identify research challenges and, eventually, yield deployable solutions for real-world networks. In this paper, we extend our prior work and present a reproducible and systematic study of learning-based CC with the aim to highlight strengths and uncover fundamental limitations of the state-of-the-art. We directly contrast said approaches with widely deployed, human-derived CC algorithms, namely TCP Cubic and BBR (version 3). We identify challenges in evaluating learning-based CC, establish a methodology for studying said approaches and perform large-scale experimentation with learning-based CC approaches that are publicly available. We show that embedding fairness directly into reward functions is effective; however, the fairness properties do not generalise into unseen conditions. We then show that RL learning-based approaches existing approaches can acquire all available bandwidth while largely maintaining low latency. Finally, we highlight that existing the latest learning-based CC approaches under-perform when the available bandwidth and end-to-end latency dynamically change while remaining resistant to non-congestive loss. As with our initial study, our experimentation codebase and datasets are publicly available with the aim to galvanise the research community towards transparency and reproducibility, which have been recognised as crucial for researching and evaluating machine-generated policies.