On efficient robust regression with subquadratic samples

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the problem of efficient robust linear regression under high-dimensional Gaussian covariates with unknown covariance and adversarial contamination. The authors propose a near-linear-time algorithm that, when the contamination rate ε and the condition number κ of the covariance matrix satisfy εκ ≲ 1, achieves the minimax-optimal prediction error O(√(εκ)) using only Õ(d/ε⁴) samples—significantly improving upon prior methods. Furthermore, they establish fundamental limits for efficient algorithms by proving, via statistical query (SQ) lower bounds and low-degree polynomial hardness, that any computationally efficient SQ algorithm attaining better error must require Ω(d²) samples, thereby characterizing the sample complexity frontier for efficient estimation in this setting.

📝 Abstract

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $κ$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/ε^4)$ samples, where $d$ is the dimension and $ε$ is the corruption rate, and achieves prediction error $O(\sqrt{εκ})$ under the condition $εκ\lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{εκ})$ when $εκ\lesssim 1$ require queries that take $Ω(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $εκ\lesssim 1$, efficient algorithms may require $\tildeΩ\left(\min\{dε^{2}κ^{2},\ ε^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

Problem

Research questions and friction points this paper is trying to address.

robust regression

Gaussian covariates

unknown covariance

sample complexity

condition number

Innovation

Methods, ideas, or system contributions that make the work stand out.

robust regression

sample complexity

statistical query lower bound