Sublime: Sublinear Error & Space for Unbounded Skewed Streams

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor memory efficiency and linearly growing estimation error of existing frequency estimation algorithms when processing highly skewed data streams. To overcome these limitations, the authors propose Sublime, a novel framework that introduces variable-length counters and an intra-cache-line expansion mechanism. Built upon Count-Min Sketch and Count Sketch, Sublime employs efficient bit-manipulation routines to enable dynamic counter expansion and fast access. The framework adaptively handles the skewness and unbounded growth inherent in data streams, achieving sublinear space complexity and sublinear error bounds. Theoretical analysis and extensive experiments demonstrate that Sublime significantly outperforms state-of-the-art methods in both memory efficiency and estimation accuracy while maintaining high throughput.

Technology Category

Application Category

📝 Abstract
Modern stream processing systems must often track the frequency of distinct keys in a data stream in real-time. Since monitoring the exact counts often entails a prohibitive memory footprint, many applications rely on compact, probabilistic data structures called frequency estimation sketches to approximate them. However, mainstream frequency estimation sketches fall short in two critical aspects: (1) They are memory-inefficient under data skew. This is because they use uniformly-sized counters to track the key counts and thus waste memory on storing the leading zeros of many small counter values. (2) Their estimation error deteriorates at least linearly with the stream's length, which may grow indefinitely over time. This is because they count the keys using a fixed number~of~counters. We present Sublime, a framework that generalizes frequency estimation sketches to address these problems by dynamically adapting to the stream's skew and length. To save memory under skew, Sublime uses short counters upfront and elongates them with extensions stored within the same cache line as they overflow. It leverages novel bit manipulation routines to quickly access a counter's extension. It also controls the scaling of its error rate by expanding its number of approximate counters as the stream grows. We apply Sublime to Count-Min Sketch and Count Sketch. We show, theoretically and empirically, that Sublime significantly improves accuracy and memory over the state of the art while maintaining competitive or superior performance.
Problem

Research questions and friction points this paper is trying to address.

frequency estimation
data skew
unbounded streams
memory efficiency
estimation error
Innovation

Methods, ideas, or system contributions that make the work stand out.

sublinear error
adaptive counters
frequency estimation
streaming algorithms
memory efficiency
🔎 Similar Papers