Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper studies linear regression under additive “stale” corruption—where the response variable remains noiseless with probability α—under Gaussian covariates: $x sim mathcal{N}(0,mathbf{I}_d)$, $y = x^ op eta + z$, with $z$ independent of $x$ and $mathbb{P}[z=0] = alpha > 0$. The goal is to recover $eta$ with small $ell_2$ error. Using the statistical query (SQ) model, the authors establish an information-theoretic lower bound: any efficient SQ algorithm requires VSTAT complexity at least $widetilde{Omega}(d^{1/2}/alpha^2)$. Crucially, the quadratic dependence on the corruption rate $alpha$ is shown to be intrinsic, revealing for the first time an unavoidable information–computation trade-off in this setting. This result provides a tight theoretical benchmark for robust high-dimensional regression, delineating fundamental limits on both sample complexity and computational efficiency under stale corruption.

Technology Category

Application Category

📝 Abstract

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d. samples from a distribution $(x, y)$ on $mathbb{R}^d imes mathbb{R}$ with $x sim mathcal{N}(0,mathbf{I}_d)$ and $y = x^ op β+ z$, where $z$ is drawn independently of $x$ from an unknown distribution $E$. Moreover, $z$ satisfies $mathbb{P}_E[z = 0] = α>0$. The goal is to accurately recover the regressor $β$ to small $ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/α)$ samples. On the other hand, the best known polynomial-time algorithms require $Ω(d/α^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/α$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $ ildeΩ(d^{1/2}/α^2)$.

Problem

Research questions and friction points this paper is trying to address.

Studying noiseless linear regression with Gaussian covariates and oblivious contamination

Recovering regressor β with small ℓ₂-error under additive contamination model

Establishing computational lower bounds for efficient algorithms in contaminated regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Statistical Query algorithms for linear regression

Lower bounds for oblivious contamination models

Quadratic sample complexity dependence on contamination

🔎 Similar Papers

Robustness, Efficiency, or Privacy: Pick Two in Machine Learning