Zephyrus: Scaling Gateways Beyond the Petabit-Era with DPU-Augmented Hierarchical Co-Offloading

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ByteDance’s cloud gateway faces resource bottlenecks under traffic surges, while coordinated offloading across DPUs and programmable switching ASICs (e.g., Tofino) remains unexplored. This paper introduces Zephyrus, the first system to unify Tofino and DPU into a single P4-programmable pipeline. It proposes Hierarchical Layered Cooperative Offloading (HLCO), a fine-grained traffic scheduling architecture that orchestrates heterogeneous hardware while preserving full software fallback capability—achieving >99% hardware offload rate. Leveraging P4 programmability, DPU general-purpose offloading, and hierarchical scheduling, Zephyrus significantly improves energy efficiency and cost-effectiveness: compared to LuoShen, it achieves 33% higher throughput, 21% lower power consumption, and 14% reduced hardware cost; against Albatross, it doubles throughput and substantially lowers total cost of ownership.

Technology Category

Application Category

📝 Abstract
Operating at petabit-scale, ByteDance's cloud gateways are deployed at critical aggregation points to orchestrate a wide array of business traffic. However, this massive scale imposes significant resource pressure on our previous-generation cloud gateways, rendering them unsustainable in the face of ever-growing cloud-network traffic. As the DPU market rapidly expands, we see a promising path to meet our escalating business traffic demands by integrating DPUs with our established Tofino-based gateways. DPUs augment these gateways with substantially larger table capacities and richer programmability without compromising previously low-latency and high-throughput forwarding. Despite compelling advantages, the practical integration of DPUs into cloud gateways remains unexplored, primarily due to underlying challenges. In this paper, we present Zephyrus, a production-scale gateway built upon a unified P4 pipeline spanning high-performance Tofino and feature-rich DPUs, which successfully overcomes these challenges. We further introduce a hierarchical co-offloading architecture (HLCO) to orchestrate traffic flow within this heterogeneous gateway, achieving > 99% hardware offloading while retaining software fallback paths for complex operations. Zephyrus outperforms LuoShen (NSDI '24) with 33% higher throughput and our evaluation further indicates 21% lower power consumption and 14% lower hardware cost. Against FPGA-based systems, Albatross (SIGCOMM '25), it doubles the throughput at a substantially lower Total Cost of Ownership (TCO), showcasing its superior performance-per-dollar. Beyond these performance gains, we also share key lessons from several years of developing and operating Zephyrus at production scale. We believe these insights provide valuable references for researchers and practitioners designing performant cloud gateways.
Problem

Research questions and friction points this paper is trying to address.

Scaling cloud gateways beyond petabit-era traffic demands
Integrating DPUs with existing Tofino-based gateway infrastructure
Overcoming resource pressure from massive cloud-network traffic growth
Innovation

Methods, ideas, or system contributions that make the work stand out.

DPU-augmented gateways with larger table capacities
Unified P4 pipeline spanning Tofino and DPUs
Hierarchical co-offloading architecture for heterogeneous traffic orchestration
🔎 Similar Papers
No similar papers found.
Y
Yuemeng Xu
Peking University
H
Haoran Chen
ByteDance
Jiarui Guo
Jiarui Guo
Peking University
M
Mingwei Cui
ByteDance
Q
Qiuheng Yin
Peking University
C
Cheng Dong
ByteDance
D
Daxiang Kang
ByteDance
X
Xian Wu
ByteDance
Chenmin Sun
Chenmin Sun
ByteDance
P
Peng He
ByteDance
Y
Yang Gao
ByteDance
L
Lirong Lai
ByteDance
K
Kai Wang
ByteDance
H
Hongyu Wu
ByteDance
T
Tong Yang
Peking University
X
Xiyun Xu
ByteDance