Network-accelerated Active Messages

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

RDMA offers high performance but suffers from programming inflexibility and substantial operational overhead; while SmartNICs provide diverse offloading capabilities, they lack a unified scheduling mechanism. This paper proposes NAAM: a novel active messaging framework that integrates RDMA with programmable SmartNICs, enabling lightweight eBPF functions to be bound to messages and dynamically scheduled for execution on clients, servers, or the NIC. Its key innovation lies in leveraging eBPF’s portability to enable cross-layer, flexible deployment of computation logic; the runtime system continuously optimizes execution placement based on real-time load and system state, unifying data access and computational offloading. Evaluated on NVIDIA BlueField-2 SmartNICs with embedded switching, NAAM achieves dynamic offloading at 1.8M MICA operations/sec and 0.75M Cell lookups/sec, with bounded tail latency increase and support for hundreds of concurrently offloaded applications—significantly outperforming existing frameworks such as iPipe.

Technology Category

Application Category

📝 Abstract

Remote Direct Memory Access (RDMA) improves host networking performance by eliminating software and server CPU involvement. However, RDMA has a limited set of operations, is difficult to program, and often requires multiple round trips to perform simple application operations. Programmable SmartNICs provide a different means to offload work from host CPUs to a NIC. This leaves applications with the complex choice of embedding logic as RPC handlers at servers, using RDMA's limited interface to access server structures via client-side logic, or running some logic on SmartNICs. The best choice varies between workloads and over time. To solve this dilemma, we present NAAM, network-accelerated active messages. NAAM applications specify small, portable eBPF functions associated with messages. Each message specifies what data it accesses using an RDMA-like interface. NAAM runs at various places in the network, including at clients, on server-attached SmartNICs, and server host CPU cores. Due to eBPF's portability, the code associated with a message can be run at any location. Hence, the NAAM runtime can dynamically steer any message to execute its associated logic wherever it makes the most sense. To demonstrate NAAM's flexibility, we built several applications, including the MICA hash table and lookups from a Cell-style B-tree. With an NVIDIA BlueField-2 SmartNIC and integrating its NIC-embedded switch, NAAM can run any of these operations on client, server, and NIC cores, shifting load in tens of milliseconds on server compute congestion. NAAM dynamically offloads up to 1.8 million MICA ops/s for YCSB-B and 750,000 Cell lookups/s from server CPUs. Finally, whereas iPipe, the state-of-the-art SmartNIC offload framework, only scales to 8 application offloads on BlueField-2, NAAM scales to hundreds of application offloads with minimal impact on tail latency due to eBPF's low overhead.

Problem

Research questions and friction points this paper is trying to address.

Addressing RDMA's limited operations and programming complexity

Choosing optimal offload locations between clients, servers, and SmartNICs

Enabling dynamic workload distribution across network endpoints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses portable eBPF functions for message logic

Dynamically steers messages to optimal locations

Leverages RDMA-like interface for data access

🔎 Similar Papers

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization