🤖 AI Summary
Data centers face a fundamental tension between strong consistency and high performance in distributed consensus protocols. Method: This paper proposes the first full hardware offload solution for linearizable replication—completely migrating Multi-Paxos and Raft protocol execution onto programmable SmartNICs, thereby bypassing host CPU and kernel network stack bottlenecks. We introduce a novel co-design of the protocol state machine and on-NIC memory management, synergistically optimized with zero-copy RDMA, hardware-supported atomic operations, and persistent logging. Results: Evaluated on real clusters, our design achieves 3.8× higher throughput, reduces end-to-end latency by 92%, and lowers host CPU utilization to just 8% of the baseline—while strictly preserving correctness and linearizability under failures.
📝 Abstract
Today's datacenter applications rely on datastores that are required to provide high availability, consistency, and performance. To achieve high availability, these datastores replicate data across several nodes. Such replication is managed through a reliable protocol designed to keep the replicas consistent using a consistency model, even in the presence of faults. For several applications, strong consistency models are favored over weaker consistency models, as the former guarantee a more intuitive behavior for clients. Furthermore, to meet the demands of high online traffic, datastores must offer high throughput and low latency. However, delivering both strong consistency and high performance simultaneously can be challenging. Reliable replication protocols typically require multiple rounds of communication over the network stack, which introduces latency and increases the load on network resources. Moreover, these protocols consume considerable CPU resources, which impacts the overall performance of applications, especially in high-throughput environments. In this work, we aim to design a hardware-accelerated system for replication protocols to address these challenges. We approach offloading the replication protocol onto SmartNICs, which are specialized network interface cards that can be programmed to implement custom logic directly on the NIC. By doing so, we aim to enhance performance while preserving strong consistency, all while saving valuable CPU cycles that can be used for applications' logic.