Smaller and More Flexible Cuckoo Filters

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Cuckoo filters achieve low false-positive rates only at the cost of substantial memory overhead, and their bucket count is constrained to powers of two, limiting flexibility and space efficiency. This paper proposes an overlapping window memory layout that replaces the conventional discrete bucket structure, compressing fingerprint width from (k+3) to (k+2) bits and theoretically reducing space overhead to 1.06(1+2/k). Crucially, it removes the power-of-two constraint on bucket count, enabling arbitrary bucket sizes and fully online insertion. The method integrates an enhanced Cuckoo hashing scheme, multi-location fingerprint placement, fingerprint compression, and load-aware window control. Experimental results demonstrate that, under comparable query performance to Prefix filters and Vector Quotient Filters (VQFs), the proposed filter achieves the smallest memory footprint among all existing online-insertion-capable filters—both theoretically and empirically outperforming state-of-the-art alternatives.

Technology Category

Application Category

📝 Abstract
Cuckoo filters are space-efficient approximate set membership data structures with a controllable false positive rate (FPR) and zero false negatives, similar to Bloom filters. In contrast to Bloom filters, Cuckoo filters store multi-bit fingerprints of keys in a hash table using variants of Cuckoo hashing, allowing each fingerprint to be stored at a small number of possible locations. Existing Cuckoo filters use fingerprints of $(k+3)$ bits per key and an additional space overhead factor of at least $1.05$ to achieve an FPR of $2^{-k}$. For $k=10$, this amounts to $1.365, kn$ bits to store $n$ keys, which is better than $1.443, kn$ bits for Bloom filters. The $+3$ for the fingerprint size is required to balance out the multiplied FPR caused by looking for the fingerprint at several locations. In the original Cuckoo filter, the number of hash table buckets is restricted to a power of 2, which may lead to much larger space overheads, up to $2.1, (1+3/k), kn$ bits. We present two improvements of Cuckoo filters. First, we remove the restriction that the number of buckets must be a power of 2 by using a different placement strategy. Second, we reduce the space overhead factor of Cuckoo filters to $1.06 , (1+2/k)$ by using overlapping windows instead of disjoint buckets to maintain the load threshold of the hash table, while reducing the number of alternative slots where any fingerprint may be found. A detailed evaluation demonstrates that the alternative memory layout based on overlapping windows decreases the size of Cuckoo filters not only in theory, but also in practice. A comparison with other state-of-the art filter types, Prefix filters and Vector Quotient filters (VQFs), shows that the reduced space overhead makes windowed Cuckoo filters the smallest filters supporting online insertions, with similarly fast queries, but longer insertion times.
Problem

Research questions and friction points this paper is trying to address.

Reducing space overhead in Cuckoo filters
Eliminating power-of-2 bucket size restriction
Improving memory layout with overlapping windows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternative placement strategy removes power-of-2 buckets restriction
Overlapping windows reduce space overhead factor
Windowed Cuckoo filters support online insertions efficiently
🔎 Similar Papers
No similar papers found.
J
Johanna Elena Schmitz
Algorithmic Bioinformatics, Faculty of Mathematics and Computer Science, Saarland University; Saarbrücken Graduate School of Computer Science; Center for Bioinformatics Saar, Saarland Informatics Campus, Saarbrücken, Germany
J
Jens Zentgraf
Algorithmic Bioinformatics, Faculty of Mathematics and Computer Science, Saarland University; Saarbrücken Graduate School of Computer Science; Center for Bioinformatics Saar, Saarland Informatics Campus, Saarbrücken, Germany
Sven Rahmann
Sven Rahmann
Center for Bioinformatics Saar and Saarland Informatics Campus, Saarland University
Algorithmic BioinformaticsSequence AnalysisHashingFiltersCombinatorial Optimization