2BRobust - Overcoming TCP BBR Performance Degradation in Virtual Machines under CPU Contention

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical reliability issue of TCP BBR in virtualized environments, where CPU resource contention frequently causes its throughput to collapse to near zero, severely undermining its performance in cloud and CDN deployments. The work systematically demonstrates for the first time that BBR consistently fails under CPU-constrained conditions, regardless of the virtualization platform or bandwidth-delay product (BDP) settings. To mitigate this, the authors propose a lightweight, adaptive repair mechanism that monitors the inflight byte count and dynamically adjusts the sending rate. Evaluated under realistic CPU contention scenarios emulated using the Linux deadline scheduler, the solution effectively restores BBR’s throughput. Experimental results show that the proposed patch boosts throughput from critically low levels—often below 10–20 Mbps—back to normal operational rates in key deployment scenarios.

Technology Category

Application Category

📝 Abstract
Motivated by the recent introduction and large-scale deployment of BBR congestion control algorithms, multiple studies have investigated the performance and fairness implications of this shift from loss-based to delay-based congestion control. Given the potential Internet-wide adoption of BBR, we must also consider its robustness in network and system scenarios. One such scenario is Cloud-based Virtual Machine (VM) networking - highly relevant in today's CDN-centric Internet. Interestingly, previous work has shown significant performance problems of BBRv1-2 running in Xen VMs, with BBR performance dropping to almost zero when CPU credit is low. In this paper, we develop a framework for measuring TCP throughput under fully controlled CPU contention, which uses Linux deadline scheduling to emulate generalized CPU contention conditions. Our measurements reveal that - in stark contrast to Cubic! - BBR throughput can break down during CPU contention under any hypervisor and all tested BDP conditions. Characterizing this performance degradation on a fine-granular level, we show that CPU limited BBR senders are capped at very low throughput levels below 10-20 Mbps. This finding implies that an Internet-wide shift from Cubic to BBR could harm the Internet's overall robustness, if not deployed with caution. To detect and overcome CPU-limited throughput, we propose a minimal BBR patch which detects the problematic situation by monitoring inflight bytes and reacts by increasing the pacing rate to make better use of the available CPU time. We show that our BBR patch overcomes the throughput problem for the most critical cases.
Problem

Research questions and friction points this paper is trying to address.

BBR
TCP congestion control
CPU contention
Virtual Machines
throughput degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

BBR
CPU contention
virtual machines
TCP congestion control
pacing rate
🔎 Similar Papers
No similar papers found.