On efficient robust regression with subquadratic samples

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
This work addresses the problem of efficient robust linear regression under high-dimensional Gaussian covariates with unknown covariance and adversarial contamination. The authors propose a near-linear-time algorithm that, when the contamination rate ε and the condition number κ of the covariance matrix satisfy εκ ≲ 1, achieves the minimax-optimal prediction error O(√(εκ)) using only Õ(d/ε⁴) samples—significantly improving upon prior methods. Furthermore, they establish fundamental limits for efficient algorithms by proving, via statistical query (SQ) lower bounds and low-degree polynomial hardness, that any computationally efficient SQ algorithm attaining better error must require Ω(d²) samples, thereby characterizing the sample complexity frontier for efficient estimation in this setting.
📝 Abstract
We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $κ$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/ε^4)$ samples, where $d$ is the dimension and $ε$ is the corruption rate, and achieves prediction error $O(\sqrt{εκ})$ under the condition $εκ\lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{εκ})$ when $εκ\lesssim 1$ require queries that take $Ω(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $εκ\lesssim 1$, efficient algorithms may require $\tildeΩ\left(\min\{dε^{2}κ^{2},\ ε^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.
Problem

Research questions and friction points this paper is trying to address.

robust regression
Gaussian covariates
unknown covariance
sample complexity
condition number
Innovation

Methods, ideas, or system contributions that make the work stand out.

robust regression
sample complexity
statistical query lower bound
condition number
low-degree polynomial hardness
🔎 Similar Papers
No similar papers found.