Optimized Learned Count-Min Sketch

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Estimating element frequencies in high-throughput data streams remains challenging, particularly when requiring rigorous error guarantees. Method: This paper proposes a learned counting sketch—analytically tractable, with provable error bounds—unlike the empirically tuned Learned Count-Min Sketch (LCMS), which lacks probabilistic upper bounds on intolerable error. Our approach jointly optimizes domain partitioning and threshold selection via dynamic programming, enabling closed-form analytical derivation of Count-Min Sketch (CMS) parameters. Contribution/Results: We derive a tight, closed-form upper bound on the probability of intolerable error and support explicit, user-specified error-threshold control. By integrating domain partitioning, approximate feasibility checking, and learning-enhanced structure, our sketch achieves estimation accuracy comparable to LCMS while significantly accelerating construction time and reducing the intolerable error probability—thereby unifying high efficiency with theoretical reliability.

Technology Category

Application Category

📝 Abstract

Count-Min Sketch (CMS) is a memory-efficient data structure for estimating the frequency of elements in a multiset. Learned Count-Min Sketch (LCMS) enhances CMS with a machine learning model to reduce estimation error under the same memory usage, but suffers from slow construction due to empirical parameter tuning and lacks theoretical guarantees on intolerable error probability. We propose Optimized Learned Count-Min Sketch (OptLCMS), which partitions the input domain and assigns each partition to its own CMS instance, with CMS parameters analytically derived for fixed thresholds, and thresholds optimized via dynamic programming with approximate feasibility checks. This reduces the need for empirical validation, enabling faster construction while providing theoretical guarantees under these assumptions. OptLCMS also allows explicit control of the allowable error threshold, improving flexibility in practice. Experiments show that OptLCMS builds faster, achieves lower intolerable error probability, and matches the estimation accuracy of LCMS.

Problem

Research questions and friction points this paper is trying to address.

Reduces empirical validation for faster construction

Provides theoretical guarantees on error probability

Allows explicit control of allowable error threshold

Innovation

Methods, ideas, or system contributions that make the work stand out.

Partitions input domain into separate CMS instances

Analytically derives CMS parameters for fixed thresholds

Optimizes thresholds via dynamic programming with feasibility checks

🔎 Similar Papers

No similar papers found.