🤖 AI Summary
To address the challenge of real-time, accurate tracking of frequent items (heavy hitters and changers) in data streams under resource-constrained settings, this paper proposes a reversible sketch framework that jointly achieves high accuracy and memory efficiency. Unlike conventional sketches, which inherently trade off accuracy for space, our approach introduces the first synergistic integration of a Reversible Bloom Filter (RBF) and a Count-Min Sketch (CM Sketch), enabling lossless joint reconstruction of key-frequency pairs. We provide a rigorous theoretical proof of reversibility and design a compact encoding scheme. Theoretically, the framework attains optimal space complexity—O(1/ε) for ε-approximate frequency estimation. Empirically, it achieves up to 3.2× higher accuracy than state-of-the-art methods under identical memory budgets, supports microsecond-scale per-item updates, and enables millisecond-scale full reconstruction of the entire frequency distribution.
📝 Abstract
Modern data stream applications demand memory-efficient solutions for accurately tracking frequent items, such as heavy hitters and heavy changers, under strict resource constraints. Traditional sketches face inherent accuracy-memory trade-offs: they either lose precision to reduce memory usage or inflate memory costs to enable high recording capacity. This paper introduces Hidden Sketch, a space-efficient reversible data structure for key and frequency encoding. Our design uniquely combines a Reversible Bloom Filter (RBF) and a Count-Min (CM) Sketch for invertible key and frequency storage, enabling precise reconstruction for both keys and their frequencies with minimal memory. Theoretical analysis establishes Hidden Sketch's space complexity and guaranteed reversibility, while extensive experiments demonstrate its substantial improvements in accuracy and space efficiency in frequent item tracking tasks. By eliminating the trade-off between reversibility and space efficiency, Hidden Sketch provides a scalable foundation for real-time stream analytics in resource-constrained environments.