🤖 AI Summary
To address the bottlenecks of slow connection establishment and poor resource sharing in RDMA control planes under dynamic scaling in elastic computing, this paper proposes a lightweight co-optimization approach: (1) caching-optimized libibverbs to significantly accelerate both cold and warm starts; and (2) leveraging RDMA’s native fork mechanism to enable secure, efficient inter-process reuse of user-space RDMA resources. We are the first to demonstrate that user-space RDMA connections can be accelerated via caching and that RDMA resources can be shared across processes via fork—challenging the conventional wisdom that microsecond-level control-plane optimization is indispensable. Our solution is deeply integrated with a Serverless framework (OpenWhisk), enabling a redesigned user-space RDMA control plane. Experiments show that, compared to the baseline, our approach achieves 30.56–46.50% higher average throughput, reduces end-to-end latency by 18.55–37.21%, and incurs only 6.5% additional control overhead.
📝 Abstract
Elastic computing enables dynamic scaling to meet workload demands, and Remote Direct Memory Access (RDMA) enhances this by providing high-throughput, low-latency network communication. However, integrating RDMA into elastic computing remains a challenge, particularly in control plane operations for RDMA connection setup. This paper revisits the assumptions of prior work on high-performance RDMA for elastic computing, and reveals that extreme microsecond-level control plane optimizations are often unnecessary. By challenging the conventional beliefs on the slowness of user-space RDMA control plane and the difficulty of user-space RDMA resource sharing, we uncover new design opportunities. Our key insight is that user-space RDMA connection setup can be significantly improved with caching, while RDMA resources can be efficiently shared among processes using fork. In light of this, we propose Swift, a simple yet effective solution that co-designs RDMA with a serverless framework to optimize performance for elastic computing. At its very core, Swift handles cold and warm serverless requests by swiftly initializing the RDMA control plane with cache-optimized libibverbs, and manages fork requests by leveraging the RDMA's fork capability. Implemented with OpenWhisk, Swift delivers 30.56-46.50% higher average throughput and 18.55-37.21% lower latency, at a cost of 6.5% control plane overhead, compared to prior solutions.