A Distributed Learned Hash Table

πŸ“… 2025-08-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Distributed Hash Tables (DHTs) natively lack efficient range query support, limiting their applicability in large language model (LLM) serving, distributed databases, and blockchain systems. This paper proposes LearnDHTβ€”the first DHT design that deeply integrates recursive machine learning models into its core architecture while preserving key-space ordering to enable low-overhead range queries. Key contributions include: (1) a learned order-preserving hash mapping; (2) a distributed recursive indexing scheme; (3) low-latency adaptive routing; and (4) an elastic network adaptation mechanism. Extensive experiments on real-world testbeds and large-scale simulations demonstrate that LearnDHT reduces query latency and communication overhead by 80–90% compared to state-of-the-art approaches, while exhibiting superior scalability and robustness under dynamic network conditions.

Technology Category

Application Category

πŸ“ Abstract
Distributed Hash Tables (DHTs) are pivotal in numerous high-impact key-value applications built on distributed networked systems, offering a decentralized architecture that avoids single points of failure and improves data availability. Despite their widespread utility, DHTs face substantial challenges in handling range queries, which are crucial for applications such as LLM serving, distributed storage, databases, content delivery networks, and blockchains. To address this limitation, we present LEAD, a novel system incorporating learned models within DHT structures to significantly optimize range query performance. LEAD utilizes a recursive machine learning model to map and retrieve data across a distributed system while preserving the inherent order of data. LEAD includes the designs to minimize range query latency and message cost while maintaining high scalability and resilience to network churn. Our comprehensive evaluations, conducted in both testbed implementation and simulations, demonstrate that LEAD achieves tremendous advantages in system efficiency compared to existing range query methods in large-scale distributed systems, reducing query latency and message cost by 80% to 90%+. Furthermore, LEAD exhibits remarkable scalability and robustness against system churn, providing a robust, scalable solution for efficient data retrieval in distributed key-value systems.
Problem

Research questions and friction points this paper is trying to address.

Optimizes range query performance in distributed hash tables
Reduces latency and message costs for distributed key-value systems
Enhances scalability and resilience against network churn
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned models integrated into DHT structures
Recursive ML model for ordered data mapping
Minimizes latency and message cost while maintaining scalability
πŸ”Ž Similar Papers
No similar papers found.
S
Shengze Wang
University of California Santa Cruz
Y
Yi Liu
University of California Santa Cruz
Xiaoxue Zhang
Xiaoxue Zhang
University of Nevada, Reno
Computer NetworksBlockChain
Liting Hu
Liting Hu
University of California Santa Cruz
stream processing systemscloud and edge computingdistributed systemsoperating systems virtualization
C
Chen Qian
University of California Santa Cruz