Elastic Sketch under Random Stationary Streams: Limiting Behavior and Near-Optimal Configuration

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between memory usage and estimation accuracy in frequency estimation over data streams. Under a stochastic stationary stream model, the authors analyze the Elastic Sketch structure, which combines a heavy block for exact counting of high-frequency items and a light block based on Count-Min Sketch for aggregating the remaining traffic. They derive, for the first time, closed-form expressions for the limiting distribution of counters and the expected estimation error, revealing the theoretical structure of the optimal eviction threshold and substantially narrowing the parameter tuning space. Leveraging probabilistic modeling, asymptotic analysis, and hashing techniques, they formulate an efficiently computable error expression that enables near-optimal allocation of memory and threshold settings. The theoretical findings are validated empirically on Zipf-distributed streams.

Technology Category

Application Category

📝 Abstract
\texttt{Elastic-Sketch} is a hash-based data structure for counting item's appearances in a data stream, and it has been empirically shown to achieve a better memory-accuracy trade-off compared to classical methods. This algorithm combines a \textit{heavy block}, which aims to maintain exact counts for a small set of dynamically \textit{elected} items, with a light block that implements \texttt{Count-Min} \texttt{Sketch} (\texttt{CM}) for summarizing the remaining traffic. The heavy block dynamics are governed by a hash function~$β$ that hashes items into~$m_1$ buckets, and an \textit{eviction threshold}~$λ$, which controls how easily an elected item can be replaced. We show that the performance of \texttt{Elastic-Sketch} strongly depends on the stream characteristics and the choice of~$λ$. Since optimal parameter choices depend on unknown stream properties, we analyze \texttt{Elastic-Sketch} under a \textit{stationary random stream} model -- a common assumption that captures the statistical regularities observed in real workloads. Formally, as the stream length goes to infinity, we derive closed-form expressions for the limiting distribution of the counters and the resulting expected counting error. These expressions are efficiently computable, enabling practical grid-based tuning of the heavy and \texttt{CM} blocks memory split (via $m_1$) and the eviction threshold~$λ$. We further characterize the structure of the optimal eviction threshold, substantially reducing the search space and showing how this threshold depends on the arrival distribution. Extensive numerical simulations validate our asymptotic results on finite streams from the Zipf distribution.
Problem

Research questions and friction points this paper is trying to address.

Elastic-Sketch
stationary random stream
eviction threshold
parameter configuration
counting error
Innovation

Methods, ideas, or system contributions that make the work stand out.

Elastic-Sketch
stationary random stream
limiting distribution
eviction threshold
Count-Min Sketch
🔎 Similar Papers
No similar papers found.