Varuna: Enabling Failure-Type Aware RDMA Failover

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiencies of existing fault-tolerance mechanisms for RDMA link failures, which uniformly retransmit all in-flight requests upon connection disruption, leading to wasted bandwidth, semantic errors, and high recovery overhead. The authors propose a failure-type-aware recovery mechanism that leverages a lightweight completion log to track the execution status of each request, thereby distinguishing between already executed and unexecuted operations. By selectively retransmitting only necessary requests and restoring their results, this approach achieves precise, execution-state-based retransmission for the first time. It prevents redundant execution of non-idempotent operations, ensures transactional consistency, and eliminates the need for connection reestablishment. Experimental results demonstrate negligible memory overhead, a steady-state latency penalty of merely 0.6–10%, and a 65% reduction in recovery retransmission time.
📝 Abstract
RDMA link failures can render connections temporarily unavailable, causing both performance degradation and significant recovery overhead. To tolerate such failures, production datacenters assign each primary link with a standby link and, upon failure, uniformly retransmit all in-flight RDMA request over the backup path. However, we observe that such blanket retransmission is unnecessary. In-flight requests can be split into pre-failure and post-failure categories depending on whether the responder has already executed. Retransmitting post-failure requests is not only redundant (consuming bandwidth), but also incorrect for non-idempotent operations, where duplicate execution can violate application semantics. We present Varuna, a failure-type-aware RDMA recovery mechanism that enables correct retransmission and us-level failover. Varuna piggybacks a lightweight completion log on every RDMA operation; after a link failure, this log deterministically reveals which in-flight requests were executed (post-failure) and which were lost (pre-failure). Varuna then retransmits only the pre-failure subset and fetches/recovers the return values for post-failure requests. Evaluated using synthetic microbenchmarks and end-to-end RDMA TPC-C transactions, Varuna incurs only 0.6-10% steady-state latency overhead in realistic applications, eliminates 65% of recovery retransmission time, preserves transactional consistency, and introduces zero connectivity rebuild overhead and negligible memory overhead during RDMA failover.
Problem

Research questions and friction points this paper is trying to address.

RDMA failover
failure-type awareness
non-idempotent operations
retransmission redundancy
transactional consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

failure-type-aware
RDMA failover
completion log
non-idempotent operations
microsecond-level recovery
🔎 Similar Papers
No similar papers found.
X
Xiaoyang Wang
University of Science and Technology of China
Yongkun Li
Yongkun Li
University of Science and Technology of China
Storage SystemMemory and File SystemKey-value SystemGraph System
L
Lulu Yao
Ningbo University
G
Guoli Wei
University of Science and Technology of China
L
Longcheng Yang
University of Science and Technology of China
Y
Yinlong Xu
University of Science and Technology of China
W
Weiqing Kong
Huawei
W
Weiguang Wang
Huawei
Peng Dong
Peng Dong
上海交通大学
传感器网络、信息融合、非线性滤波、目标跟踪、飞行器导航制导与控制
Bingyang Liu
Bingyang Liu
Tsinghua University
Computer Networks