Robust Regression with Adaptive Contamination in Response: Optimal Rates and Computational Barriers

📅 2026-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses robust regression under a setting where covariates remain uncontaminated while responses suffer from adaptive corruption, overcoming the fundamental limitation of inconsistent estimation inherent in the classical Huber contamination model. By leveraging clean covariate information, the authors devise a novel estimator that achieves consistency even when the contamination fraction is constant, and for the first time establish an information-theoretically optimal convergence rate that improves upon the Huber model under this regime. The theoretical analysis derives minimax lower bounds via Fano’s inequality combined with a multi-distribution simultaneous contamination construction. Furthermore, using Statistical Query and low-degree polynomial methods, the study reveals a substantial statistical–computational gap: the optimal statistical rate cannot be attained by any polynomial-time algorithm.
📝 Abstract
We study robust regression under a contamination model in which covariates are clean while the responses may be corrupted in an adaptive manner. Unlike the classical Huber's contamination model, where both covariates and responses may be contaminated and consistent estimation is impossible when the contamination proportion is a non-vanishing constant, it turns out that the clean-covariate setting admits strictly improved statistical guarantees. Specifically, we show that the additional information in the clean covariates can be carefully exploited to construct an estimator that achieves a better estimation rate than that attainable under Huber contamination. In contrast to the Huber model, this improved rate implies consistency even when the contamination is a constant. A matching minimax lower bound is established using Fano's inequality together with the construction of contamination processes that match $m> 2$ distributions simultaneously, extending the previous two-point lower bound argument in Huber's setting. Despite the improvement over the Huber model from an information-theoretic perspective, we provide formal evidence -- in the form of Statistical Query and Low-Degree Polynomial lower bounds -- that the problem exhibits strong information-computation gaps. Our results strongly suggest that the information-theoretic improvements cannot be achieved by polynomial-time algorithms, revealing a fundamental gap between information-theoretic and computational limits in robust regression with clean covariates.
Problem

Research questions and friction points this paper is trying to address.

robust regression
adaptive contamination
clean covariates
information-computation gap
statistical guarantees
Innovation

Methods, ideas, or system contributions that make the work stand out.

robust regression
adaptive contamination
clean covariates
minimax lower bound
information-computation gap
🔎 Similar Papers
No similar papers found.