S""{o}ze: One Network Telemetry Is All You Need for Per-flow Weighted Bandwidth Allocation at Scale

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

248K/year
🤖 AI Summary
Existing approaches struggle to achieve fine-grained, flow-level weighted bandwidth allocation in large-scale cloud data centers without topology awareness, flow tables, or centralized controllers. This paper proposes the first fully decentralized architecture that leverages only commodity switch primitives—namely, port counters—and in-band network telemetry (INT), augmented by distributed feedback control and a lightweight weight-to-rate mapping mechanism. It is the first to rigorously demonstrate precise flow-level weighted bandwidth enforcement without requiring path information or global state synchronization. Evaluation via simulation and real-world deployment shows average TPC-H job completion time improvement of 1.79× (44% reduction), with peak gains of 1.59× (37% reduction). The system scales to tens of thousands of concurrent flows while enabling real-time, adaptive bandwidth allocation—significantly enhancing flexibility and scalability of data center network resource management.

Technology Category

Application Category

📝 Abstract
Weighted bandwidth allocation is a powerful abstraction that has a wide range of use cases in modern data center networks. However, realizing highly agile and precise weighted bandwidth allocation for large-scale cloud environments is fundamentally challenging. In this paper, we propose S""{o}ze, a lightweight decentralized weighted bandwidth allocation system that leverages simple network telemetry features of commodity Ethernet switches. Given the flow weights, S""{o}ze can effectively use the telemetry information to compute and enforce the weighted bandwidth allocations without per-flow, topology, or routing knowledge. We demonstrate the effectiveness of S""{o}ze through simulations and testbed experiments, improving TPC-H jobs completion time by up to $0.59 imes$ and $0.79 imes$ on average.
Problem

Research questions and friction points this paper is trying to address.

Achieving agile weighted bandwidth allocation in large-scale clouds
Decentralized system using simple switch telemetry for flow weights
Improving job completion times without per-flow routing knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses simple network telemetry features
Decentralized weighted bandwidth allocation
No per-flow or topology knowledge needed