🤖 AI Summary
This paper resolves an open problem posed by Alon et al.: proving that random linear hashing—defined by an $n imes n$ matrix drawn uniformly from $mathbb{F}_2^{n imes n}$—achieves expected maximum load $Theta(log n / log log n)$ when hashing $n$ balls into $n$ bins, matching the optimal bound for fully random hashing. The authors establish, for the first time, the asymptotic optimality of linear hashing under this canonical load-balancing metric. They further derive a strong tail bound: the probability that any bin’s load exceeds $r cdot Theta(log n / log log n)$ is $O(1/r^2)$. Technically, the analysis overcomes challenges arising from linear dependencies via structural properties of linear maps over finite fields, higher-order moment estimation, refined probabilistic inequalities, and precise modeling of bin-load distributions. This result settles a long-standing theoretical question on the load capacity of linear hashing, open since STOC’97.
📝 Abstract
We prove that hashing $n$ balls into $n$ bins via a random matrix over $mathbf{F}_2$ yields expected maximum load $O(log n / log log n)$. This matches the expected maximum load of a fully random function and resolves an open question posed by Alon, Dietzfelbinger, Miltersen, Petrank, and Tardos (STOC '97, JACM '99). More generally, we show that the maximum load exceeds $rcdotlog n/loglog n$ with probability at most $O(1/r^2)$.