🤖 AI Summary
To address two critical bottlenecks in memory-disaggregated key-value stores—excessive reliance on one-sided atomic operations for index processing and low cache efficiency at compute nodes—this paper proposes a Dynamic Index Proxy (DIP) architecture that flexibly offloads index operations to compute nodes. We introduce three key innovations: (1) rank-aware hot-key detection to identify and prioritize frequently accessed keys; (2) a two-level memory management scheme tailored for compute-node DRAM and persistent memory; and (3) an RPC-aggregated cache coherence protocol ensuring strong consistency while minimizing remote access overhead. Experimental evaluation demonstrates that DIP achieves up to 2.94× higher throughput and reduces P99 latency by 85.2% compared to state-of-the-art disaggregated KV systems, significantly alleviating both remote memory access latency and cache coherence bottlenecks.
📝 Abstract
Disaggregated memory (DM) is a promising data center architecture that decouples CPU and memory into independent resource pools to improve resource utilization. Building on DM, memory-disaggregated key-value (KV) stores are adopted to efficiently manage remote data. Unfortunately, existing approaches suffer from poor performance due to two critical issues: 1) the overdependence on one-sided atomic operations in index processing, and 2) the constrained efficiency in compute-side caches. To address these issues, we propose FlexKV, a memory-disaggregated KV store with index proxying. Our key idea is to dynamically offload the index to compute nodes, leveraging their powerful CPUs to accelerate index processing and maintain high-performance compute-side caches. Three challenges have to be addressed to enable efficient index proxying on DM, i.e., the load imbalance across compute nodes, the limited memory of compute nodes, and the expensive cache coherence overhead. FlexKV proposes: 1) a rank-aware hotness detection algorithm to continuously balance index load across compute nodes, 2) a two-level CN memory optimization scheme to efficiently utilize compute node memory, and 3) an RPC-aggregated cache management mechanism to reduce cache coherence overhead. The experimental results show that FlexKV improves throughput by up to 2.94$ imes$ and reduces latency by up to 85.2%, compared with the state-of-the-art memory-disaggregated KV stores.