$L_p$ Sampling in Distributed Data Streams with Applications to Adversarial Robustness

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

280K/year

🤖 AI Summary

This paper studies adversarially robust perfect $L_p$ sampling ($p geq 1$) in distributed data streams. We address core tasks—including global frequency vector $L_p$-norm estimation, $F_p$-moment estimation, and heavy-hitter detection—by proposing the first perfect $L_p$ sampling protocol applicable to all $p geq 1$. Our method integrates randomized sampling, polylogarithmic-precision control, and adversarial robustness analysis, achieving optimal (up to logarithmic factors) communication complexity under the distributed monitoring model. The protocol attains optimal or near-optimal robustness against adaptive adversaries: for $F_p$-moment estimation, it incurs communication cost $ ilde{O}(k^{p-1}/varepsilon^2)$, matching the known lower bound tightly. This significantly improves both theoretical guarantees and practical applicability over prior approaches.

Technology Category

Application Category

📝 Abstract

In the distributed monitoring model, a data stream over a universe of size $n$ is distributed over $k$ servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central coordinator. In such settings, the ability to efficiently collect a random sample from the global stream is a powerful primitive, enabling a wide array of downstream tasks such as estimating frequency moments, detecting heavy hitters, or performing sparse recovery. Of particular interest is the task of producing a perfect $L_p$ sample, which given a frequency vector $f in mathbb{R}^n$, outputs an index $i$ with probability $frac{f_i^p}{|f|_p^p}+frac{1}{mathrm{poly}(n)}$. In this paper, we resolve the problem of perfect $L_p$ sampling for all $pge 1$ in the distributed monitoring model. Specifically, our algorithm runs in $k^{p-1} cdot mathrm{polylog}(n)$ bits of communication, which is optimal up to polylogarithmic factors. Utilizing our perfect $L_p$ sampler, we achieve adversarially-robust distributed monitoring protocols for the $F_p$ moment estimation problem, where the goal is to provide a $(1+varepsilon)$-approximation to $f_1^p+ldots+f_n^p$. Our algorithm uses $frac{k^{p-1}}{varepsilon^2}cdotmathrm{polylog}(n)$ bits of communication for all $pge 2$ and achieves optimal bounds up to polylogarithmic factors, matching lower bounds by Woodruff and Zhang (STOC 2012) in the non-robust setting. Finally, we apply our framework to achieve near-optimal adversarially robust distributed protocols for central problems such as counting, frequency estimation, heavy-hitters, and distinct element estimation.

Problem

Research questions and friction points this paper is trying to address.

Developing efficient L_p sampling in distributed data streams

Achieving adversarially-robust F_p moment estimation protocols

Creating robust distributed algorithms for frequency statistics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Perfect L_p sampling in distributed monitoring model

Optimal communication cost for all p values

Adversarially robust protocols for frequency moments

🔎 Similar Papers

No similar papers found.