🤖 AI Summary
This paper studies the sublinear estimation of the mean of a point set in $d$-dimensional Euclidean space: given only a small number of random samples, compute a $(1+varepsilon)$-approximation to the true mean—the minimizer of the sum of squared distances—with probability at least $1-delta$. We establish, for the first time, the optimal sample complexity $Theta(varepsilon^{-1} log delta^{-1})$. Two sublinear-time algorithms are proposed: (1) an accelerated gradient descent method with time complexity $O((varepsilon^{-1} + log log delta^{-1}) log delta^{-1} cdot d)$; and (2) a novel geometric median-of-means framework integrating order statistics and clustering, achieving $O((varepsilon^{-1} + log^gamma delta^{-1}) log delta^{-1} cdot d)$ complexity. Our key innovation is the generalization of the classical median-of-means estimator to the *geometric* median-of-means, accompanied by a unified analysis of its robustness and convergence—substantially improving estimation efficiency and theoretical guarantees under high-dimensional, sparse sampling.
📝 Abstract
We study the sublinear multivariate mean estimation problem in $d$-dimensional Euclidean space. Specifically, we aim to find the mean $mu$ of a ground point set $A$, which minimizes the sum of squared Euclidean distances of the points in $A$ to $mu$. We first show that a multiplicative $(1+varepsilon)$ approximation to $mu$ can be found with probability $1-delta$ using $O(varepsilon^{-1}log delta^{-1})$ many independent uniform random samples, and provide a matching lower bound. Furthermore, we give two sublinear time algorithms with optimal sample complexity for extracting a suitable approximate mean: 1. A gradient descent approach running in time $O((varepsilon^{-1}+loglog delta^{-1})cdot log delta^{-1} cdot d)$. It optimizes the geometric median objective while being significantly faster for our specific setting than all other known algorithms for this problem. 2. An order statistics and clustering approach running in time $Oleft((varepsilon^{-1}+log^{gamma}delta^{-1})cdot log delta^{-1} cdot d
ight)$ for any constant $gamma>0$. Throughout our analysis, we also generalize the familiar median-of-means estimator to the multivariate case, showing that the geometric median-of-means estimator achieves an optimal sample complexity for estimating $mu$, which may be of independent interest.