F2: Designing a Key-Value Store for Large Skewed Workloads

📅 2023-05-02

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

253K/year

🤖 AI Summary

Existing LSM-tree and B-tree–based KV stores—as well as systems like FASTER—face bottlenecks in index memory overhead, log-compaction efficiency, and hot working-set management for large-scale KV services characterized by memory constraints, highly skewed access patterns, and datasets far exceeding main memory capacity. Method: This paper proposes a hierarchical record-oriented KV storage architecture featuring (i) a novel multi-threaded lock-free log compaction mechanism, (ii) a lock-free read-modify-write (RMW) algorithm, (iii) a two-level hash index coupled with a read cache, and (iv) tight integration of LSM-style cold-hot data separation with a CPU-optimized lock-free concurrency engine, specifically tailored to NVMe SSD characteristics. Results: Under typical skewed workloads, the system achieves 2.1×–11.75× higher average throughput than RocksDB, significantly reduces write amplification, and delivers low latency with high robustness.

📝 Abstract

Many real-world workloads present a challenging set of requirements: point operations requiring high throughput, working sets much larger than main memory, and natural skew in key access patterns for both reads and writes. We find that modern key-value designs are either optimized for memory-efficiency, sacrificing high-performance (LSM-tree designs), or achieve high-performance, saturating modern NVMe SSD bandwidth, at the cost of substantial memory resources or high disk wear (CPU-optimized designs). Unfortunately these designs are not able to handle meet the challenging demands of such larger-than-memory, skewed workloads. To this end, we present F2, a new key-value store that bridges this gap by combining the strengths of both approaches. F2 adopts a tiered, record-oriented architecture inspired by LSM-trees to effectively separate hot from cold records, while incorporating concurrent latch-free mechanisms from CPU-optimized engines to maximize performance on modern NVMe SSDs. To realize this design, we tackle key challenges and introduce several innovations, including new latch-free algorithms for multi-threaded log compaction and user operations (e.g., RMWs), as well as new components: a two-level hash index to reduce indexing overhead for cold records and a read-cache for serving read-hot data. Detailed experimental results show that F2 matches or outperforms existing solutions, achieving on average better throughput on memory-constrained environments compared to state-of-the-art systems like RocksDB (11.75x), SplinterDB (4.52x), KVell (10.56x), LeanStore (2.04x), and FASTER (2.38x). F2 also maintains its high performance across varying workload skewness levels and memory budgets, while achieving low disk write amplification.

Problem

Research questions and friction points this paper is trying to address.

Handling large skewed workloads in key-value stores

Reducing indexing and compaction overheads

Optimizing performance for read-hot and write-hot sets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-tier record-oriented design for skewed workloads

Latch-free algorithms for multi-threaded log compaction

Two-level hash index to reduce cold record overhead

🔎 Similar Papers

No similar papers found.