Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper addresses robust mean estimation for high-dimensional Gaussian distributions under constant-fraction mean-shift contamination. To overcome the exponential time complexity and poor sample efficiency of existing methods, we propose the first polynomial-time algorithm: it formulates the problem via moment-constrained optimization and integrates spectral-analysis-driven iterative filtering with convex programming to achieve statistically optimal mean estimation. Under a constant contamination rate—i.e., an arbitrary but fixed fraction of outliers—the algorithm attains near-optimal sample complexity, guarantees estimation error convergence to arbitrary precision, and reduces computational complexity from exponential to polynomial time. Our key contribution is breaking the long-standing trade-off between computational tractability and statistical accuracy, delivering the first solution for high-dimensional robust estimation that simultaneously achieves computational efficiency, robustness to adversarial contamination, and statistical optimality.

Technology Category

Application Category

📝 Abstract

We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $mathbb{R}^d$ generated i.i.d. via the following process. For a parameter $alpha<1/2$, the $i$-th sample $x_i$ is obtained as follows: with probability $1-alpha$, $x_i$ is drawn from $mathcal{N}(mu, I)$, where $mu in mathbb{R}^d$ is the target mean; and with probability $alpha$, $x_i$ is drawn from $mathcal{N}(z_i, I)$, where $z_i$ is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.

Problem

Research questions and friction points this paper is trying to address.

Efficient robust mean estimation algorithm

Handles high-dimensional mean-shift contamination

Tolerates constant fraction of outliers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient multivariate robust estimation

Handles mean-shift contamination

Runs in sample-polynomial time

🔎 Similar Papers

No similar papers found.