🤖 AI Summary
In cloud- and AI-driven data centers, intra-datacenter (intra-DC) and inter-datacenter (inter-DC) traffic coexist, but their significantly divergent round-trip times (RTTs) cause congestion control mismatch: intra-DC flows react rapidly and monopolize bandwidth, harming rate fairness; inter-DC flows suffer slow loss recovery, degrading reliability. Existing solutions employ fragmented, isolated control mechanisms, failing to jointly ensure fairness and robustness. Method: We propose the first unified congestion control and connection management architecture, integrating RTT-aware fast feedback, fair rate allocation, erasure-code-enhanced load balancing, and dynamic adaptive routing—all within a single protocol stack. Contribution/Results: Our design jointly optimizes latency, throughput, fairness, and reliability. Experiments show that, compared to Gemini, flow completion times improve by 32% for inter-DC and 24% for intra-DC traffic, with substantial gains in end-to-end communication efficiency and fairness.
📝 Abstract
Cloud computing and AI workloads are driving unprecedented demand for efficient communication within and across datacenters. However, the coexistence of intra- and inter-datacenter traffic within datacenters plus the disparity between the RTTs of intra- and inter-datacenter networks complicates congestion management and traffic routing. Particularly, faster congestion responses of intra-datacenter traffic causes rate unfairness when competing with slower inter-datacenter flows. Additionally, inter-datacenter messages suffer from slow loss recovery and, thus, require reliability. Existing solutions overlook these challenges and handle inter- and intra-datacenter congestion with separate control loops or at different granularities. We propose Uno, a unified system for both inter- and intra-DC environments that integrates a transport protocol for rapid congestion reaction and fair rate control with a load balancing scheme that combines erasure coding and adaptive routing. Our findings show that Uno significantly improves the completion times of both inter- and intra-DC flows compared to state-of-the-art methods such as Gemini.