🤖 AI Summary
Bloom filters in key-value stores suffer from parameter rigidity: the number of hash functions must be an integer, and bit-array lengths are typically constrained to powers of two—limiting optimization of false-positive rate (FPR) and space efficiency. This paper proposes two novel variants to overcome these constraints. First, the Rational Bloom Filter (RBF) introduces the first Bloom filter supporting a rational number of hash functions, breaking the integer constraint via fractional hashing modeled using universal hash functions (e.g., Murmur). Second, the Variably-Sized Block Bloom Filter (VSBF) employs a block-based memory layout with binary-efficient index mapping, eliminating the power-of-two restriction on bit-array length. Both designs retain compatibility with standard universal hashing and incorporate rational-parameter modeling for precise FPR control. Experiments demonstrate significantly lower FPR at identical space overhead, scalable computation for large filters, and zero-bit-rate performance approaching the theoretical optimum.
📝 Abstract
These days, Key-Value Stores are widely used for scalable data storage. In this environment, Bloom filters serve as an efficient probabilistic data structure for the representation of sets of keys as they allow for set membership queries with controllable false positive rates and no false negatives. For optimal error rates, the right choice of the main parameters, namely the length of the Bloom filter array, the number of hash functions used to map an element to the array's indices, and the number of elements to be inserted in one filter, is crucial. However, these parameters are constrained: The number of hash functions is bounded to integer values, and the length of a Bloom filter is usually chosen to be a power-of-two to allow for efficient modulo operations using binary arithmetics. These modulo calculations are necessary to map from the output universe of the applied universal hash functions, like Murmur, to the set of indices of the Bloom filter. In this paper, we relax these constraints by proposing the Rational Bloom filter, which allows for non-integer numbers of hash functions. This results in optimized fraction-of-zero values for a known number of elements to be inserted. Based on this, we construct the Variably-Sized Block Bloom filters to allow for a flexible filter length, especially for large filters, while keeping computation efficient.