🤖 AI Summary
Under dynamic data loading, d-ary Cuckoo hashing suffers from low insertion and lookup efficiency at high load factors (1−δ), while struggling to jointly optimize vacancy rate and rule count.
Method: This paper proposes Bubble-Up, an online reordering mechanism that integrates a random hash function family with load-adaptive displacement analysis, restricting each insertion to only d = ⌈ln(1/ε) + α⌉ hash locations (where ε ≈ δ).
Contribution/Results: We theoretically prove that Bubble-Up achieves expected insertion time O(1/δ), reduces positive lookup time to O(1)—the first such result breaking the traditional O(d) dependency—and retains negative lookup time at O(d), achieving information-theoretically optimal trade-offs among all three metrics. Experiments confirm that Bubble-Up significantly reduces rehashing overhead and vacancy rate under near-full-load conditions, establishing a new paradigm for high-performance online hash indexing.
📝 Abstract
A $d$-ary cuckoo hash table is an open-addressed hash table that stores each key $x$ in one of $d$ random positions $h_1(x), h_2(x), ldots, h_d(x)$. In the offline setting, where all items are given and keys need only be matched to locations, it is possible to support a load factor of $1 - epsilon$ while using $d = lceil ln epsilon^{-1} + o(1)
ceil$ hashes. The online setting, where keys are moved as new keys arrive sequentially, has the additional challenge of the time to insert new keys, and it has not been known whether one can use $d = O(ln epsilon^{-1})$ hashes to support $poly(epsilon^{-1})$ expected-time insertions. In this paper, we introduce bubble-up cuckoo hashing, an implementation of $d$-ary cuckoo hashing that achieves all of the following properties simultaneously: (1) uses $d = lceil ln epsilon^{-1} + alpha
ceil$ hash locations per item for an arbitrarily small positive constant $alpha$. (2) achieves expected insertion time $O(delta^{-1})$ for any insertion taking place at load factor $1 - delta le 1 - epsilon$. (3) achieves expected positive query time $O(1)$, independent of $d$ and $epsilon$. The first two properties give an essentially optimal value of $d$ without compromising insertion time. The third property is interesting even in the offline setting: it says that, even though emph{negative} queries must take time $d$, positive queries can actually be implemented in $O(1)$ expected time, even when $d$ is large.