🤖 AI Summary
Existing Cuckoo filters achieve low false-positive rates only at the cost of substantial memory overhead, and their bucket count is constrained to powers of two, limiting flexibility and space efficiency. This paper proposes an overlapping window memory layout that replaces the conventional discrete bucket structure, compressing fingerprint width from (k+3) to (k+2) bits and theoretically reducing space overhead to 1.06(1+2/k). Crucially, it removes the power-of-two constraint on bucket count, enabling arbitrary bucket sizes and fully online insertion. The method integrates an enhanced Cuckoo hashing scheme, multi-location fingerprint placement, fingerprint compression, and load-aware window control. Experimental results demonstrate that, under comparable query performance to Prefix filters and Vector Quotient Filters (VQFs), the proposed filter achieves the smallest memory footprint among all existing online-insertion-capable filters—both theoretically and empirically outperforming state-of-the-art alternatives.
📝 Abstract
Cuckoo filters are space-efficient approximate set membership data structures with a controllable false positive rate (FPR) and zero false negatives, similar to Bloom filters. In contrast to Bloom filters, Cuckoo filters store multi-bit fingerprints of keys in a hash table using variants of Cuckoo hashing, allowing each fingerprint to be stored at a small number of possible locations. Existing Cuckoo filters use fingerprints of $(k+3)$ bits per key and an additional space overhead factor of at least $1.05$ to achieve an FPR of $2^{-k}$. For $k=10$, this amounts to $1.365, kn$ bits to store $n$ keys, which is better than $1.443, kn$ bits for Bloom filters. The $+3$ for the fingerprint size is required to balance out the multiplied FPR caused by looking for the fingerprint at several locations. In the original Cuckoo filter, the number of hash table buckets is restricted to a power of 2, which may lead to much larger space overheads, up to $2.1, (1+3/k), kn$ bits. We present two improvements of Cuckoo filters. First, we remove the restriction that the number of buckets must be a power of 2 by using a different placement strategy. Second, we reduce the space overhead factor of Cuckoo filters to $1.06 , (1+2/k)$ by using overlapping windows instead of disjoint buckets to maintain the load threshold of the hash table, while reducing the number of alternative slots where any fingerprint may be found. A detailed evaluation demonstrates that the alternative memory layout based on overlapping windows decreases the size of Cuckoo filters not only in theory, but also in practice. A comparison with other state-of-the art filter types, Prefix filters and Vector Quotient filters (VQFs), shows that the reduced space overhead makes windowed Cuckoo filters the smallest filters supporting online insertions, with similarly fast queries, but longer insertion times.