Sparse Linear Regression is Easy on Random Supports

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Sparse linear regression exhibits a sample–computational trade-off under worst-case design matrices: existing algorithms either require exponential time (d^Ω(k)) or rely on O(d) samples for polynomial-time recovery. This work presents the first polynomial-time algorithm for **arbitrary design matrices**, including those with condition number as large as 2^poly(d), under the assumption that the signal’s support is drawn uniformly at random. Our method integrates randomized support analysis, condition-number-robust estimation arguments, and high-dimensional statistical tools. It achieves ε prediction error using only poly(k, log d, 1/ε) samples and runs in poly(d, N) time. Crucially, it dispenses with restrictive assumptions—such as restricted isometry property (RIP), Gaussian designs, or specific noise distributions—required by prior works. To our knowledge, this is the first result achieving both polynomial sample complexity and polynomial runtime in a fully general setting, thereby closing a long-standing exponential gap between statistical and computational efficiency.

Technology Category

Application Category

📝 Abstract

Sparse linear regression is one of the most basic questions in machine learning and statistics. Here, we are given as input a design matrix $X in mathbb{R}^{N imes d}$ and measurements or labels ${y} in mathbb{R}^N$ where ${y} = {X} {w}^* + {xi}$, and ${xi}$ is the noise in the measurements. Importantly, we have the additional constraint that the unknown signal vector ${w}^*$ is sparse: it has $k$ non-zero entries where $k$ is much smaller than the ambient dimension. Our goal is to output a prediction vector $widehat{{w}}$ that has small prediction error: $frac{1}{N}cdot |{X} {w}^* - {X} widehat{{w}}|^2_2$. Information-theoretically, we know what is best possible in terms of measurements: under most natural noise distributions, we can get prediction error at most $epsilon$ with roughly $N = O(k log d/epsilon)$ samples. Computationally, this currently needs $d^{Omega(k)}$ run-time. Alternately, with $N = O(d)$, we can get polynomial-time. Thus, there is an exponential gap (in the dependence on $d$) between the two and we do not know if it is possible to get $d^{o(k)}$ run-time and $o(d)$ samples. We give the first generic positive result for worst-case design matrices ${X}$: For any ${X}$, we show that if the support of ${w}^*$ is chosen at random, we can get prediction error $epsilon$ with $N = ext{poly}(k, log d, 1/epsilon)$ samples and run-time $ ext{poly}(d,N)$. This run-time holds for any design matrix ${X}$ with condition number up to $2^{ ext{poly}(d)}$. Previously, such results were known for worst-case ${w}^*$, but only for random design matrices from well-behaved families, matrices that have a very low condition number ($ ext{poly}(log d)$; e.g., as studied in compressed sensing), or those with special structural properties.

Problem

Research questions and friction points this paper is trying to address.

Sparse linear regression with random support achieves efficient prediction error.

Overcoming exponential runtime gap in worst-case design matrices.

Enabling polynomial-time recovery for arbitrary condition number matrices.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random support assumption enables efficient sparse regression

Achieves poly(k, log d) samples for any design matrix

Polynomial runtime for matrices with exponential condition number

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique