FlexiNS: A SmartNIC-Centric, Line-Rate and Flexible Network Stack

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance bottleneck in traditional network stacks caused by the mismatch between CPU and network bandwidth growth rates, this paper proposes a novel SmartNIC-centric programmable network stack. Our approach innovatively designs: (1) a header-field offloading transmission path; (2) cache-aware infinite working-set receive processing; (3) a pure-DMA zero-copy notification channel; and (4) a programmable offload engine supporting software-defined transport layers. Implemented on NVIDIA BlueField-3, the stack maintains full compatibility with the RDMA IB Verbs (IBV) interface. Evaluation demonstrates 2.2× higher throughput than the baseline in disaggregated block storage workloads, 1.3× improvement over hardware-offloaded baselines in KVCache transfer scenarios, and end-to-end line-rate packet processing. This work is the first to simultaneously achieve transport-layer software programmability, high flexibility, and line-rate performance.

Technology Category

Application Category

📝 Abstract
As the gap between network and CPU speeds rapidly increases, the CPU-centric network stack proves inadequate due to excessive CPU and memory overhead. While hardware-offloaded network stacks alleviate these issues, they suffer from limited flexibility in both control and data planes. Offloading network stack to off-path SmartNIC seems promising to provide high flexibility; however, throughput remains constrained by inherent SmartNIC architectural limitations. To this end, we design FlexiNS, a SmartNIC-centric network stack with software transport programmability and line-rate packet processing capabilities. To grapple with the limitation of SmartNIC-induced challenges, FlexiNS introduces: (a) a header-only offloading TX path; (b) an unlimited-working-set in-cache processing RX path; (c) a high-performance DMA-only notification pipe; and (d) a programmable offloading engine. We prototype FlexiNS using Nvidia BlueField-3 SmartNIC and provide out-of-the-box RDMA IBV verbs compatibility to users. FlexiNS achieves 2.2$ imes$ higher throughput than the microkernel-based baseline in block storage disaggregation and 1.3$ imes$ higher throughput than the hardware-offloaded baseline in KVCache transfer.
Problem

Research questions and friction points this paper is trying to address.

Addresses CPU and memory overhead in CPU-centric network stacks
Overcomes flexibility limitations in hardware-offloaded network stacks
Enhances SmartNIC throughput despite architectural constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Header-only offloading TX path
Unlimited-working-set in-cache RX path
High-performance DMA-only notification pipe
🔎 Similar Papers
No similar papers found.
Xuzheng Chen
Xuzheng Chen
Zhejiang University
J
Jie Zhang
Zhejiang University
B
Baolin Zhu
Zhejiang University
X
Xueying Zhu
Zhejiang University
Z
Zhongqing Chen
Alibaba Cloud
S
Shu Ma
Alibaba Cloud
L
Lingjun Zhu
Alibaba Cloud
Chao Shi
Chao Shi
Alibaba Cloud
Y
Yin Zhang
Zhejiang University
Zeke Wang
Zeke Wang
Zhejiang University
Machine Learning SystemsSmartNICFPGAGPU