🤖 AI Summary
This paper addresses robust mean estimation for high-dimensional Gaussian distributions under constant-fraction mean-shift contamination. To overcome the exponential time complexity and poor sample efficiency of existing methods, we propose the first polynomial-time algorithm: it formulates the problem via moment-constrained optimization and integrates spectral-analysis-driven iterative filtering with convex programming to achieve statistically optimal mean estimation. Under a constant contamination rate—i.e., an arbitrary but fixed fraction of outliers—the algorithm attains near-optimal sample complexity, guarantees estimation error convergence to arbitrary precision, and reduces computational complexity from exponential to polynomial time. Our key contribution is breaking the long-standing trade-off between computational tractability and statistical accuracy, delivering the first solution for high-dimensional robust estimation that simultaneously achieves computational efficiency, robustness to adversarial contamination, and statistical optimality.
📝 Abstract
We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $mathbb{R}^d$ generated i.i.d. via the following process. For a parameter $alpha<1/2$, the $i$-th sample $x_i$ is obtained as follows: with probability $1-alpha$, $x_i$ is drawn from $mathcal{N}(mu, I)$, where $mu in mathbb{R}^d$ is the target mean; and with probability $alpha$, $x_i$ is drawn from $mathcal{N}(z_i, I)$, where $z_i$ is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.