MagnifierSketch: Quantile Estimation Centered at One Point

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the challenge of balancing accuracy and efficiency in per-key quantile estimation over high-throughput data streams—where centralized, single-point approaches suffer from inherent bottlenecks—this paper introduces the first dedicated sketch structure tailored for this task. Our method leverages three novel mechanisms: Value Focus (prioritizing high-impact value ranges), Distribution Calibration (dynamically adjusting internal distribution assumptions), and Double Filtration (hierarchical noise suppression). These components jointly ensure unbiased estimation under rigorous theoretical guarantees, while achieving sublinear space and time complexity. Experimental evaluations demonstrate that our approach significantly outperforms state-of-the-art baselines in average estimation error. Furthermore, it has been integrated into RocksDB, where it effectively reduces tail-latency overhead in production query workloads.

Technology Category

Application Category

📝 Abstract

In this paper, we take into consideration quantile estimation in data stream models, where every item in the data stream is a key-value pair. Researchers sometimes aim to estimate per-key quantiles (i.e. quantile estimation for every distinct key), and some popular use cases, such as tail latency measurement, recline on a predefined single quantile (e.g. 0.95- or 0.99- quantile) rather than demanding arbitrary quantile estimation. However, existing algorithms are not specially designed for per-key estimation centered at one point. They cannot achieve high accuracy in our problem setting, and their throughput are not satisfactory to handle high-speed items in data streams. To solve this problem, we propose MagnifierSketch for point-quantile estimation. MagnifierSketch supports both single-key and per-key quantile estimation, and its key techniques are named Value Focus, Distribution Calibration and Double Filtration. We provide strict mathematical derivations to prove the unbiasedness of MagnifierSketch and show its space and time complexity. Our experimental results show that the Average Error (AE) of MagnifierSketch is significantly lower than the state-of-the-art in both single-key and per-key situations. We also implement MagnifierSketch on RocksDB database to reduce quantile query latency in real databases. All related codes of MagnifierSketch are open-sourced and available at GitHub.

Problem

Research questions and friction points this paper is trying to address.

Estimates per-key quantiles in data streams efficiently

Focuses on predefined single quantile for tail latency

Improves accuracy and throughput for high-speed streams

Innovation

Methods, ideas, or system contributions that make the work stand out.

MagnifierSketch uses Value Focus for point-quantile estimation

It applies Distribution Calibration to enhance accuracy

Double Filtration technique improves throughput in data streams

🔎 Similar Papers

fastkqr: A Fast Algorithm for Kernel Quantile Regression