🤖 AI Summary
RDMA offers high performance but suffers from programming inflexibility and substantial operational overhead; while SmartNICs provide diverse offloading capabilities, they lack a unified scheduling mechanism. This paper proposes NAAM: a novel active messaging framework that integrates RDMA with programmable SmartNICs, enabling lightweight eBPF functions to be bound to messages and dynamically scheduled for execution on clients, servers, or the NIC. Its key innovation lies in leveraging eBPF’s portability to enable cross-layer, flexible deployment of computation logic; the runtime system continuously optimizes execution placement based on real-time load and system state, unifying data access and computational offloading. Evaluated on NVIDIA BlueField-2 SmartNICs with embedded switching, NAAM achieves dynamic offloading at 1.8M MICA operations/sec and 0.75M Cell lookups/sec, with bounded tail latency increase and support for hundreds of concurrently offloaded applications—significantly outperforming existing frameworks such as iPipe.
📝 Abstract
Remote Direct Memory Access (RDMA) improves host networking performance by eliminating software and server CPU involvement. However, RDMA has a limited set of operations, is difficult to program, and often requires multiple round trips to perform simple application operations. Programmable SmartNICs provide a different means to offload work from host CPUs to a NIC. This leaves applications with the complex choice of embedding logic as RPC handlers at servers, using RDMA's limited interface to access server structures via client-side logic, or running some logic on SmartNICs. The best choice varies between workloads and over time. To solve this dilemma, we present NAAM, network-accelerated active messages. NAAM applications specify small, portable eBPF functions associated with messages. Each message specifies what data it accesses using an RDMA-like interface. NAAM runs at various places in the network, including at clients, on server-attached SmartNICs, and server host CPU cores. Due to eBPF's portability, the code associated with a message can be run at any location. Hence, the NAAM runtime can dynamically steer any message to execute its associated logic wherever it makes the most sense. To demonstrate NAAM's flexibility, we built several applications, including the MICA hash table and lookups from a Cell-style B-tree. With an NVIDIA BlueField-2 SmartNIC and integrating its NIC-embedded switch, NAAM can run any of these operations on client, server, and NIC cores, shifting load in tens of milliseconds on server compute congestion. NAAM dynamically offloads up to 1.8 million MICA ops/s for YCSB-B and 750,000 Cell lookups/s from server CPUs. Finally, whereas iPipe, the state-of-the-art SmartNIC offload framework, only scales to 8 application offloads on BlueField-2, NAAM scales to hundreds of application offloads with minimal impact on tail latency due to eBPF's low overhead.