Efficient $d$-ary Cuckoo Hashing at High Load Factors by Bubbling Up

📅 2025-01-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Under dynamic data loading, d-ary Cuckoo hashing suffers from low insertion and lookup efficiency at high load factors (1−δ), while struggling to jointly optimize vacancy rate and rule count. Method: This paper proposes Bubble-Up, an online reordering mechanism that integrates a random hash function family with load-adaptive displacement analysis, restricting each insertion to only d = ⌈ln(1/ε) + α⌉ hash locations (where ε ≈ δ). Contribution/Results: We theoretically prove that Bubble-Up achieves expected insertion time O(1/δ), reduces positive lookup time to O(1)—the first such result breaking the traditional O(d) dependency—and retains negative lookup time at O(d), achieving information-theoretically optimal trade-offs among all three metrics. Experiments confirm that Bubble-Up significantly reduces rehashing overhead and vacancy rate under near-full-load conditions, establishing a new paradigm for high-performance online hash indexing.

Technology Category

Application Category

📝 Abstract

A $d$-ary cuckoo hash table is an open-addressed hash table that stores each key $x$ in one of $d$ random positions $h_1(x), h_2(x), ldots, h_d(x)$. In the offline setting, where all items are given and keys need only be matched to locations, it is possible to support a load factor of $1 - epsilon$ while using $d = lceil ln epsilon^{-1} + o(1) ceil$ hashes. The online setting, where keys are moved as new keys arrive sequentially, has the additional challenge of the time to insert new keys, and it has not been known whether one can use $d = O(ln epsilon^{-1})$ hashes to support $poly(epsilon^{-1})$ expected-time insertions. In this paper, we introduce bubble-up cuckoo hashing, an implementation of $d$-ary cuckoo hashing that achieves all of the following properties simultaneously: (1) uses $d = lceil ln epsilon^{-1} + alpha ceil$ hash locations per item for an arbitrarily small positive constant $alpha$. (2) achieves expected insertion time $O(delta^{-1})$ for any insertion taking place at load factor $1 - delta le 1 - epsilon$. (3) achieves expected positive query time $O(1)$, independent of $d$ and $epsilon$. The first two properties give an essentially optimal value of $d$ without compromising insertion time. The third property is interesting even in the offline setting: it says that, even though emph{negative} queries must take time $d$, positive queries can actually be implemented in $O(1)$ expected time, even when $d$ is large.

Problem

Research questions and friction points this paper is trying to address.

Dynamic Data Loading

d-fork Cuckoo Hashing

Efficiency Improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bubble-up Hashing

Dynamic Data Insertion

Optimal Storage Efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow