🤖 AI Summary
This work addresses high-dimensional sparse regression under heavy-tailed noise and design matrices, where conventional methods fail due to distortion in the geometry of the empirical risk. The authors propose a Robust Iterative Gradient Hard Thresholding (RIGHT) framework that employs robust gradient estimation, thereby avoiding reliance on higher-order moments and accommodating both linear and logistic regression. Theoretical analysis shows that, in linear regression, the estimation error decouples from the noise tail index, while sample complexity separates from the design tail index; logistic regression inherently exhibits robustness to heavy-tailed designs owing to its bounded gradients. RIGHT achieves minimax-optimal estimation accuracy and sample complexity simultaneously—without requiring sample splitting or the existence of the population risk—matching the minimax lower bounds.
📝 Abstract
We investigate high-dimensional sparse regression when both the noise and the design matrix exhibit heavy-tailed behavior. Standard algorithms typically fail in this regime, as heavy-tailed covariates distort the empirical risk geometry. We propose a unified framework, Robust Iterative Gradient descent with Hard Thresholding (RIGHT), which employs a robust gradient estimator to bypass the need for higher-order moment conditions. Our analysis reveals a fundamental decoupling phenomenon: in linear regression, the estimation error rate is governed by the noise tail index, while the sample complexity required for stability is governed by the design tail index. This implies that while heavy-tailed noise limits precision, heavy-tailed designs primarily raise the sample size barrier for convergence. In contrast, for logistic regression, we show that the bounded gradient naturally robustifies the estimator against heavy-tailed designs, restoring standard parametric rates. We derive matching minimax lower bounds to prove that RIGHT achieves optimal estimation accuracy and sample complexity across these regimes, without requiring sample splitting or the existence of the population risk.