🤖 AI Summary
To address the limitations of single-key frequency estimation in high-speed data streams—namely, low accuracy, slow update speed, and reliance on ground-truth labels under strict memory constraints—this paper proposes UCL-sketch, an unsupervised online learning sketch. Methodologically, it introduces (1) a novel label-free online training mechanism that enables real-time, self-adaptive parameter updates via equivalent learning, and (2) a hierarchical logical estimation bucket architecture that balances fine-grained accuracy and computational scalability within bounded memory. Its equation-driven sketch framework supports efficient incremental inference. Experiments on both real-world and synthetic datasets demonstrate that UCL-sketch significantly outperforms state-of-the-art methods—including Count-Min and DeepSketch—reducing single-key estimation error by 40%–65%, achieving superior frequency distribution fitting, while maintaining comparable memory overhead.
📝 Abstract
Estimating the frequency of items on the high-volume, fast data stream has been extensively studied in many areas, such as database and network measurement. Traditional sketch algorithms only allow to give very rough estimates with limited memory cost, whereas some learning-augmented algorithms have been proposed recently, their offline framework requires actual frequencies that are challenging to access in general for training, and speed is too slow for real-time processing, despite the still coarse-grained accuracy. To this end, we propose a more practical learning-based estimation framework namely UCL-sketch, by following the line of equation-based sketch to estimate per-key frequencies. In a nutshell, there are two key techniques: online training via equivalent learning without ground truth, and highly scalable architecture with logical estimation buckets. We implemented experiments on both real-world and synthetic datasets. The results demonstrate that our method greatly outperforms existing state-of-the-art sketches regarding per-key accuracy and distribution, while preserving resource efficiency. Our code is attached in the supplementary material, and will be made publicly available at https://github.com/Y-debug-sys/UCL-sketch.