🤖 AI Summary
This paper studies linear regression under additive “stale” corruption—where the response variable remains noiseless with probability α—under Gaussian covariates: $x sim mathcal{N}(0,mathbf{I}_d)$, $y = x^ op eta + z$, with $z$ independent of $x$ and $mathbb{P}[z=0] = alpha > 0$. The goal is to recover $eta$ with small $ell_2$ error. Using the statistical query (SQ) model, the authors establish an information-theoretic lower bound: any efficient SQ algorithm requires VSTAT complexity at least $widetilde{Omega}(d^{1/2}/alpha^2)$. Crucially, the quadratic dependence on the corruption rate $alpha$ is shown to be intrinsic, revealing for the first time an unavoidable information–computation trade-off in this setting. This result provides a tight theoretical benchmark for robust high-dimensional regression, delineating fundamental limits on both sample complexity and computational efficiency under stale corruption.
📝 Abstract
We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d. samples from a distribution $(x, y)$ on $mathbb{R}^d imes mathbb{R}$ with $x sim mathcal{N}(0,mathbf{I}_d)$ and $y = x^ op β+ z$, where $z$ is drawn independently of $x$ from an unknown distribution $E$. Moreover, $z$ satisfies $mathbb{P}_E[z = 0] = α>0$. The goal is to accurately recover the regressor $β$ to small $ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/α)$ samples. On the other hand, the best known polynomial-time algorithms require $Ω(d/α^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/α$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $ ildeΩ(d^{1/2}/α^2)$.