$L_p$ Sampling in Distributed Data Streams with Applications to Adversarial Robustness

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies adversarially robust perfect $L_p$ sampling ($p geq 1$) in distributed data streams. We address core tasks—including global frequency vector $L_p$-norm estimation, $F_p$-moment estimation, and heavy-hitter detection—by proposing the first perfect $L_p$ sampling protocol applicable to all $p geq 1$. Our method integrates randomized sampling, polylogarithmic-precision control, and adversarial robustness analysis, achieving optimal (up to logarithmic factors) communication complexity under the distributed monitoring model. The protocol attains optimal or near-optimal robustness against adaptive adversaries: for $F_p$-moment estimation, it incurs communication cost $ ilde{O}(k^{p-1}/varepsilon^2)$, matching the known lower bound tightly. This significantly improves both theoretical guarantees and practical applicability over prior approaches.

Technology Category

Application Category

📝 Abstract
In the distributed monitoring model, a data stream over a universe of size $n$ is distributed over $k$ servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central coordinator. In such settings, the ability to efficiently collect a random sample from the global stream is a powerful primitive, enabling a wide array of downstream tasks such as estimating frequency moments, detecting heavy hitters, or performing sparse recovery. Of particular interest is the task of producing a perfect $L_p$ sample, which given a frequency vector $f in mathbb{R}^n$, outputs an index $i$ with probability $frac{f_i^p}{|f|_p^p}+frac{1}{mathrm{poly}(n)}$. In this paper, we resolve the problem of perfect $L_p$ sampling for all $pge 1$ in the distributed monitoring model. Specifically, our algorithm runs in $k^{p-1} cdot mathrm{polylog}(n)$ bits of communication, which is optimal up to polylogarithmic factors. Utilizing our perfect $L_p$ sampler, we achieve adversarially-robust distributed monitoring protocols for the $F_p$ moment estimation problem, where the goal is to provide a $(1+varepsilon)$-approximation to $f_1^p+ldots+f_n^p$. Our algorithm uses $frac{k^{p-1}}{varepsilon^2}cdotmathrm{polylog}(n)$ bits of communication for all $pge 2$ and achieves optimal bounds up to polylogarithmic factors, matching lower bounds by Woodruff and Zhang (STOC 2012) in the non-robust setting. Finally, we apply our framework to achieve near-optimal adversarially robust distributed protocols for central problems such as counting, frequency estimation, heavy-hitters, and distinct element estimation.
Problem

Research questions and friction points this paper is trying to address.

Developing efficient L_p sampling in distributed data streams
Achieving adversarially-robust F_p moment estimation protocols
Creating robust distributed algorithms for frequency statistics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perfect L_p sampling in distributed monitoring model
Optimal communication cost for all p values
Adversarially robust protocols for frequency moments
🔎 Similar Papers
No similar papers found.