🤖 AI Summary
This paper resolves the optimal space complexity for estimating the second frequency moment $F_2$ in data streams under small additive error $varepsilon < 1/sqrt{n}$. To bridge the long-standing gap between the $Omega(n)$ lower bound and the $O(n log n)$ upper bound, we first fully characterize the two-party communication complexity of set intersection size under additive error, establishing a tight $Omega(n log n)$ lower bound for one-way communication in this regime. Leveraging this insight, we introduce a novel multi-pass paradigm and design a two-pass algorithm that reconstructs the stream histogram exactly with high probability using only $O(n log log n)$ bits of space. Our main result is a tight space complexity characterization for $F_2$ estimation across the full error range: $Thetaig(min(n, 1/varepsilon^2) cdot (1 + |log(varepsilon^2 n)|)ig)$. This is the first asymptotic separation between single-pass and constant-pass algorithms for $F_2$.
📝 Abstract
Estimating the second frequency moment $F_2$ of a data stream up to a $(1 pm varepsilon)$ factor is a central problem in the streaming literature. For errors $varepsilon > Ω(1/sqrt{n})$, the tight bound $Θleft(log(varepsilon^2 n)/varepsilon^2
ight)$ was recently established by Braverman and Zamir. In this work, we complete the picture by resolving the remaining regime of small error, $varepsilon < 1/sqrt{n}$, showing that the optimal space complexity is $Θleft( minleft(n, frac{1}{varepsilon^2}
ight) cdot left(1 + left| log(varepsilon^2 n)
ight|
ight)
ight)$ bits for all $varepsilon geq 1/n^2$, assuming a sufficiently large universe. This closes the gap between the best known $Ω(n)$ lower bound and the straightforward $O(n log n)$ upper bound in that range, and shows that essentially storing the entire stream is necessary for high-precision estimation.
To derive this bound, we fully characterize the two-party communication complexity of estimating the size of a set intersection up to an arbitrary additive error $varepsilon n$. In particular, we prove a tight $Ω(n log n)$ lower bound for one-way communication protocols when $varepsilon < n^{-1/2-Ω(1)}$, in contrast to classical $O(n)$-bit protocols that use two-way communication. Motivated by this separation, we present a two-pass streaming algorithm that computes the exact histogram of a stream with high probability using only $O(n log log n)$ bits of space, in contrast to the $Θ(n log n)$ bits required in one pass even to approximate $F_2$ with small error. This yields the first asymptotic separation between one-pass and $O(1)$-passes space complexity for small frequency moment estimation.