๐ค AI Summary
Existing SmartNICs struggle to balance bandwidth, customizability, offload capability, and software compatibility. This work proposes a hardware/software co-designed SmartNIC architecture that, for the first time, deeply integrates a streaming compute unit (SCU) with embedded ARM cores into a 200G datapath, treating the NIC data path as a native streaming computing substrate. The architecture supports user-defined offloads, programmable congestion control, and integrates an offloaded TCP/IP and RDMA protocol stack, direct GPU/SSD connectivity, and native Linux networking interfaces. Experimental results demonstrate performance comparable to commercial platforms and successfully validate diverse applications, including collective communication offload and network-to-GPU hash-based data partitioning.
๐ Abstract
Although modern, AI-centric datacenters heavily rely on SmartNICs, existing devices impose a hard trade-off. Commercial SmartNICs provide high bandwidth and easy software integration, but offer limited support for customization and data processing offload. In contrast, research SmartNICs often suffer from low bandwidth, limited functionality, and poor software compatibility -- to the point that many are not actual NICs in a technical sense. This gap can be closed by treating the NIC datapath as a first-class stream computation substrate with shared hardware/software abstractions for a tight co-design of infrastructure and applications. To demonstrate this, we introduce SCENIC, an open-source datacenter SmartNIC. SCENIC implements a 200G network datapath over offloaded TCP/IP and RDMA stacks, as well as a fallback path for processing arbitrary network traffic. On top of the network logic, SCENIC combines on-datapath Stream Compute Units (SCUs) for data processing and embedded ARM cores for flexible control path manipulation with direct access to GPUs and SSDs. SCENIC is fully integrated with the OS, exposing native Linux network and RDMA verb interfaces, making the programmable datapath transparent to existing applications while enabling control of, e.g., user-defined offloads and programmable congestion control. SCENIC's performance matches commercial platforms, and we show its versatility through several use cases such as offloaded collective communication and network-to-GPU hash-based data partitioning.