🤖 AI Summary
Concurrent online analytics over high-speed data streams—supporting diverse queries (e.g., point queries, F₁/F₂ moment estimation)—remains challenging under heavy update workloads. Method: This paper proposes LMQ-Sketch, the first unified sketch enabling concurrent execution of multiple query types under high-frequency updates. Its core innovation is the “Lagom” mechanism, which jointly guarantees query diversity, strong concurrency semantics (monotonicity and intermediate-value linearizability), and resource efficiency within a single structure. It employs a geometry-guided lightweight synchronization protocol, dynamic load distribution, linearization control, and a composite sketch design. Contribution/Results: LMQ-Sketch achieves <100 μs end-to-end latency, >2 billion updates/s throughput, and 10× lower memory overhead versus state-of-the-art sketches, with theoretically bounded estimation error. Experiments demonstrate significantly higher accuracy at equivalent throughput, enabling real-time, high-throughput streaming analytics.
📝 Abstract
Data sketches balance resource efficiency with controllable approximations for extracting features in high-volume, high-rate data. Two important points of interest are highlighted separately in recent works; namely, to (1) answer multiple types of queries from one pass, and (2) query concurrently with updates. Several fundamental challenges arise when integrating these directions, which we tackle in this work. We investigate the trade-offs to be balanced and synthesize key ideas into LMQ-Sketch, a single, composite data sketch supporting multiple queries (frequency point queries, frequency moments F1, and F2) concurrently with updates. Our method'Lagom'is a cornerstone of LMQ-Sketch for low-latency global querying (<100 us), combining freshness, timeliness, and accuracy with a low memory footprint and high throughput (>2B updates/s). We analyze and evaluate the accuracy of Lagom, which builds on a simple geometric argument and efficiently combines work distribution with synchronization for proper concurrency semantics -- monotonicity of operations and intermediate value linearizability. Comparing with state-of-the-art methods (which, as mentioned, only cover either mixed queries or concurrency), LMQ-Sketch shows highly competitive throughput, with additional accuracy guarantees and concurrency semantics, while also reducing the required memory budget by an order of magnitude. We expect the methodology to have broader impact on concurrent multi-query sketches.