🤖 AI Summary
Existing approaches struggle to achieve fine-grained, flow-level weighted bandwidth allocation in large-scale cloud data centers without topology awareness, flow tables, or centralized controllers. This paper proposes the first fully decentralized architecture that leverages only commodity switch primitives—namely, port counters—and in-band network telemetry (INT), augmented by distributed feedback control and a lightweight weight-to-rate mapping mechanism. It is the first to rigorously demonstrate precise flow-level weighted bandwidth enforcement without requiring path information or global state synchronization. Evaluation via simulation and real-world deployment shows average TPC-H job completion time improvement of 1.79× (44% reduction), with peak gains of 1.59× (37% reduction). The system scales to tens of thousands of concurrent flows while enabling real-time, adaptive bandwidth allocation—significantly enhancing flexibility and scalability of data center network resource management.
📝 Abstract
Weighted bandwidth allocation is a powerful abstraction that has a wide range of use cases in modern data center networks. However, realizing highly agile and precise weighted bandwidth allocation for large-scale cloud environments is fundamentally challenging. In this paper, we propose S""{o}ze, a lightweight decentralized weighted bandwidth allocation system that leverages simple network telemetry features of commodity Ethernet switches. Given the flow weights, S""{o}ze can effectively use the telemetry information to compute and enforce the weighted bandwidth allocations without per-flow, topology, or routing knowledge. We demonstrate the effectiveness of S""{o}ze through simulations and testbed experiments, improving TPC-H jobs completion time by up to $0.59 imes$ and $0.79 imes$ on average.