Rateless Bloom Filters: Set Reconciliation for Divergent Replicas with Variable-Sized Elements

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inefficient synchronization of variable-length element sets in distributed systems under high-divergence scenarios (e.g., network partition recovery), this paper proposes a two-phase hybrid delta-synchronization protocol. Our key contribution is an adaptive Rateless Bloom Filter (RBF), which dynamically optimizes false-positive rates without prior knowledge of the set difference size, asymptotically achieving the communication complexity of an optimal static Bloom Filter. We further integrate Invertible Bloom Lookup Tables (IBLTs) with rateless prefix filtering and adopt an incremental encoding transmission strategy to enable efficient synchronization of variable-length elements. Experimental results demonstrate that, when Jaccard similarity falls below 85%, our approach reduces total communication overhead by over 20% compared to state-of-the-art methods, significantly improving synchronization efficiency in high-divergence settings.

Technology Category

Application Category

📝 Abstract
Set reconciliation protocols typically make two critical assumptions: they are designed for fixed-sized elements and they are optimized for when the difference cardinality, d, is very small. When adapting to variable-sized elements, the current practice is to synchronize fixed-size element digests. However, when the number of differences is considerable, such as after a network partition, this approach can be inefficient. Our solution is a two-stage hybrid protocol that introduces a preliminary Bloom filter step, specifically designed for this regime. The novelty of this approach, however, is in solving a core technical challenge: determining the optimal Bloom filter size without knowing d. Our solution is the Rateless Bloom Filter (RBF), a dynamic filter that naturally adapts to arbitrary symmetric differences, closely matching the communication complexity of an optimally configured static filter without requiring any prior parametrization. Our evaluation in sets of variable-sized elements shows that for Jaccard indices below 85%, our RBF-IBLT hybrid protocol reduces the total communication cost by up to over 20% compared to the state-of-the-art.
Problem

Research questions and friction points this paper is trying to address.

Optimizing set reconciliation for variable-sized elements efficiently
Determining optimal Bloom filter size without knowing difference cardinality
Reducing communication costs in high-divergence network partition scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid protocol with preliminary Bloom filter step
Rateless Bloom Filter adapts to arbitrary symmetric differences
Dynamic filter matches optimal static filter communication complexity
🔎 Similar Papers
No similar papers found.