🤖 AI Summary
This paper addresses the high-dimensional robust mean estimation problem: given $N$ noisy $D$-dimensional samples, where only an $alpha$-fraction satisfy bounded covariance (“good” data), the goal is to accurately estimate their common mean $mu$. We propose the first information-theoretically near-optimal algorithm, achieving estimation error $f(alpha,N) + sqrt{D/(alpha N)}$. Our method introduces a novel framework integrating bias correction, iterative dimensionality reduction, and rejection sampling, augmented with list-decodable learning and one-dimensional robust estimators as subroutines. The algorithm is computationally efficient—running in polynomial time—while attaining the optimal statistical rate dictated by the information-theoretic lower bound. This resolves a long-standing tension between accuracy and efficiency in high-dimensional robust mean estimation, providing the first polynomial-time estimator whose convergence rate matches the fundamental statistical limit.
📝 Abstract
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. Specifically, given $N$ independent random points $x_1,ldots,x_N$ in $mathbb{R}^D$ and a parameter $alpha in (0, 1)$ such that each $x_i$ is drawn from a Gaussian with mean $mu$ and unknown covariance, and an unknown $alpha$-fraction of the points have identity-bounded covariances, the goal is to estimate the common mean $mu$. The one-dimensional version of this task has received significant attention in theoretical computer science and statistics over the past decades. Recent work [LY20; CV24] has given near-optimal upper and lower bounds for the one-dimensional setting. On the other hand, our understanding of even the information-theoretic aspects of the multivariate setting has remained limited. In this work, we design a computationally efficient algorithm achieving an information-theoretically near-optimal error. Specifically, we show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrt{D/(alpha N)}$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate. Our algorithmic approach employs an iterative refinement strategy, whereby we progressively learn more accurate approximations $hat mu$ to $mu$. This is achieved via a novel rejection sampling procedure that removes points significantly deviating from $hat mu$, as an attempt to filter out unusually noisy samples. A complication that arises is that rejection sampling introduces bias in the distribution of the remaining points. To address this issue, we perform a careful analysis of the bias, develop an iterative dimension-reduction strategy, and employ a novel subroutine inspired by list-decodable learning that leverages the one-dimensional result.