🤖 AI Summary
Learned Bloom Filters (LBFs) lose their one-sided false positive guarantee under adversarial queries, undermining reliability in security-critical applications. Method: We propose the Downtown Bodega Filter (DBF), the first LBF variant with provable adversarial security. DBF introduces a lightweight security mechanism—adding only 2λ extra bits and at most one pseudorandom permutation—integrated with a classical Bloom filter and a learned model. We formalize a strong adversary model: a hybrid adversary with partial query control, and provide a rigorous security proof within this model. We further develop a mixed-query workload analysis framework unifying realistic data distributions and adversarial scenarios. Contribution/Results: Theoretical analysis and experiments demonstrate that DBF preserves the one-sided error property with low false positive rates while significantly reducing query latency. It outperforms state-of-the-art baselines in both security guarantees and end-to-end performance.
📝 Abstract
The Learned Bloom Filter is a recently proposed data structure that combines the Bloom Filter with a Learning Model while preserving the Bloom Filter's one-sided error guarantees. Creating an adversary-resilient construction of the Learned Bloom Filter with provable guarantees is an open problem. We define a strong adversarial model for the Learned Bloom Filter. Our adversarial model extends an existing adversarial model designed for the Classical (i.e. not"Learned") Bloom Filter by prior work and considers computationally bounded adversaries that run in probabilistic polynomial time (PPT). Using our model, we construct an adversary-resilient variant of the Learned Bloom Filter called the Downtown Bodega Filter. We show that: if pseudo-random permutations exist, then an Adversary Resilient Learned Bloom Filter may be constructed with $2lambda$ extra bits of memory and at most one extra pseudo-random permutation in the critical path. We construct a hybrid adversarial model for the case where a fraction of the query workload is chosen by an adversary. We show realistic scenarios where using the Downtown Bodega Filter gives better performance guarantees compared to alternative approaches in this hybrid model.