🤖 AI Summary
This paper investigates mean estimation under an ℓₚ-ball constraint in Gaussian noise, focusing on the statistical optimality of the maximum likelihood estimator (MLE) for 1 < p < 2. Through minimax analysis and constructive lower bound techniques, it establishes— for the first time—that the MLE suffers polynomially larger risk than the minimax optimal estimator in this regime, implying a suboptimal convergence rate. To address this deficiency, the authors propose a nonlinear shrinkage-based estimator that achieves the minimax optimal rate under the same ℓₚ constraint. The work precisely characterizes the critical boundary governing MLE’s optimality versus suboptimality—depending on p, signal-to-noise ratio, and dimension—and challenges two prevailing beliefs: that the MLE is inherently optimal and that nonlinear estimators universally dominate linear ones. It reveals a nuanced, nontrivial interplay between constraint geometry and estimation strategy.
📝 Abstract
We revisit the problem of mean estimation on $ell_p$ balls under additive Gaussian noise. When $p$ is strictly less than $2$, it is well understood that rate-optimal estimators must be nonlinear in the observations. In this work, we study the maximum likelihood estimator (MLE), which may be viewed as a nonlinear shrinkage procedure for mean estimation over $ell_p$ balls. We demonstrate two phenomena for the behavior of the MLE, which depend on the noise level, the radius of the norm constraint, the dimension, and the norm index $p$. First, as a function of the dimension, for $p$ near $1$ or at least $2$, the MLE is minimax rate-optimal for all noise levels and all constraint radii. On the other hand, for $p$ between $1$ and $2$, there is a more striking behavior: for essentially all noise levels and radii for which nonlinear estimates are required, the MLE is minimax rate-suboptimal, despite being nonlinear in the observations. Our results also imply similar conclusions when given $n$ independent and identically distributed Gaussian samples, where we demonstrate that the MLE can be suboptimal by a polynomial factor in the sample size. Our lower bounds are constructive: whenever the MLE is rate-suboptimal, we provide explicit instances on which the MLE provably incurs suboptimal risk.