🤖 AI Summary
This paper investigates the fundamental limits of robustness for nonparametric regression under adversarial input contamination, focusing on regression functions residing in the second-order Sobolev space. We consider an adversary capable of arbitrarily corrupting up to $o(n)$ samples. First, we establish a minimax lower bound on the estimation error under such contamination. Second, we prove that when the contamination fraction vanishes asymptotically, a suitably regularized smoothing spline estimator achieves the optimal convergence rate—its estimation error tends to zero at the minimax-optimal rate in $n$. In contrast, if the contamination fraction remains constant, no estimator can achieve consistent estimation—the error cannot converge to zero. This work fills a critical gap in the theory of adversarial robustness for nonparametric regression, providing the first precise characterization of the trade-off between tolerable contamination level and achievable estimation accuracy.
📝 Abstract
In this paper, we investigate the adversarial robustness of regression, a fundamental problem in machine learning, under the setting where an adversary can arbitrarily corrupt a subset of the input data. While the robustness of parametric regression has been extensively studied, its nonparametric counterpart remains largely unexplored. We characterize the adversarial robustness in nonparametric regression, assuming the regression function belongs to the second-order Sobolev space (i.e., it is square integrable up to its second derivative). The contribution of this paper is two-fold: (i) we establish a minimax lower bound on the estimation error, revealing a fundamental limit that no estimator can overcome, and (ii) we show that, perhaps surprisingly, the classical smoothing spline estimator, when properly regularized, exhibits robustness against adversarial corruption. These results imply that if $o(n)$ out of $n$ samples are corrupted, the estimation error of the smoothing spline vanishes as $n o infty$. On the other hand, when a constant fraction of the data is corrupted, no estimator can guarantee vanishing estimation error, implying the optimality of the smoothing spline in terms of maximum tolerable number of corrupted samples.