A Distributed Learned Hash Table

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Distributed Hash Tables (DHTs) natively lack efficient range query support, limiting their applicability in large language model (LLM) serving, distributed databases, and blockchain systems. This paper proposes LearnDHT—the first DHT design that deeply integrates recursive machine learning models into its core architecture while preserving key-space ordering to enable low-overhead range queries. Key contributions include: (1) a learned order-preserving hash mapping; (2) a distributed recursive indexing scheme; (3) low-latency adaptive routing; and (4) an elastic network adaptation mechanism. Extensive experiments on real-world testbeds and large-scale simulations demonstrate that LearnDHT reduces query latency and communication overhead by 80–90% compared to state-of-the-art approaches, while exhibiting superior scalability and robustness under dynamic network conditions.

Technology Category

Application Category

📝 Abstract

Distributed Hash Tables (DHTs) are pivotal in numerous high-impact key-value applications built on distributed networked systems, offering a decentralized architecture that avoids single points of failure and improves data availability. Despite their widespread utility, DHTs face substantial challenges in handling range queries, which are crucial for applications such as LLM serving, distributed storage, databases, content delivery networks, and blockchains. To address this limitation, we present LEAD, a novel system incorporating learned models within DHT structures to significantly optimize range query performance. LEAD utilizes a recursive machine learning model to map and retrieve data across a distributed system while preserving the inherent order of data. LEAD includes the designs to minimize range query latency and message cost while maintaining high scalability and resilience to network churn. Our comprehensive evaluations, conducted in both testbed implementation and simulations, demonstrate that LEAD achieves tremendous advantages in system efficiency compared to existing range query methods in large-scale distributed systems, reducing query latency and message cost by 80% to 90%+. Furthermore, LEAD exhibits remarkable scalability and robustness against system churn, providing a robust, scalable solution for efficient data retrieval in distributed key-value systems.

Problem

Research questions and friction points this paper is trying to address.

Optimizes range query performance in distributed hash tables

Reduces latency and message costs for distributed key-value systems

Enhances scalability and resilience against network churn

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned models integrated into DHT structures

Recursive ML model for ordered data mapping

Minimizes latency and message cost while maintaining scalability

🔎 Similar Papers

BinomialHash: A Constant Time, Minimal Memory Consistent Hash Algorithm