The communication complexity of distributed estimation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies distributed expectation estimation in the two-party communication model: Alice and Bob hold distributions $p$ and $q$, respectively, and must estimate $mathbb{E}_{xsim p,ysim q}[f(x,y)]$ to additive error $varepsilon$, where $f$ is a bounded function. To overcome the quadratic dependence $O(R(f)/varepsilon^2)$ of classical protocols on $1/varepsilon$, we propose a novel unbiased randomized protocol achieving linear dependence $O(R(f)/varepsilon)$. We further design optimal protocols for the Equality (EQ) and Greater-Than (GT) functions, and prove that EQ is the simplest communication-wise among all full-rank Boolean functions. Leveraging spectral analysis, randomized sampling, and discrepancy-based techniques, we establish tight upper and lower bounds, attaining theoretical optimality across broad function classes. Our results confirm the asymptotic tightness and universality of the proposed protocol.

Technology Category

Application Category

📝 Abstract
We study an extension of the standard two-party communication model in which Alice and Bob hold probability distributions $p$ and $q$ over domains $X$ and $Y$, respectively. Their goal is to estimate [ mathbb{E}_{x sim p,, y sim q}[f(x, y)] ] to within additive error $varepsilon$ for a bounded function $f$, known to both parties. We refer to this as the distributed estimation problem. Special cases of this problem arise in a variety of areas including sketching, databases and learning. Our goal is to understand how the required communication scales with the communication complexity of $f$ and the error parameter $varepsilon$. The random sampling approach -- estimating the mean by averaging $f$ over $O(1/varepsilon^2)$ random samples -- requires $O(R(f)/varepsilon^2)$ total communication, where $R(f)$ is the randomized communication complexity of $f$. We design a new debiasing protocol which improves the dependence on $1/varepsilon$ to be linear instead of quadratic. Additionally we show better upper bounds for several special classes of functions, including the Equality and Greater-than functions. We introduce lower bound techniques based on spectral methods and discrepancy, and show the optimality of many of our protocols: the debiasing protocol is tight for general functions, and that our protocols for the equality and greater-than functions are also optimal. Furthermore, we show that among full-rank Boolean functions, Equality is essentially the easiest.
Problem

Research questions and friction points this paper is trying to address.

Estimating expected value of f(x,y) with additive error ε
Understanding communication scaling with f complexity and ε
Improving quadratic dependence on 1/ε to linear
Innovation

Methods, ideas, or system contributions that make the work stand out.

Debiasing protocol reduces communication to linear dependence
New upper bounds for Equality and Greater-than functions
Lower bounds prove optimality using spectral methods
🔎 Similar Papers
No similar papers found.