🤖 AI Summary
Kernel ridge regression (KRR) suffers from prohibitive memory and computational costs in large-scale settings. This paper focuses on the low-rank approximation theory of KRR and makes four key contributions: (i) it derives, for the first time, a tight lower bound on the minimal rank required to preserve prediction consistency—providing rigorous, optimal theoretical guarantees for Nyström-type approximations; (ii) it proves that the computational complexity of the Nyström approximation is nearly linear in the number of samples; (iii) it establishes an approximation error bound for kernel functions in the range space of the integral operator and characterizes the growth behavior of the associated weight function norm; and (iv) it significantly expands the admissible range of regularization parameters. Collectively, these results unify the analytical framework for reliability, efficiency, and stability of low-rank KRR approximations, offering both foundational theoretical insights and practical guidance for scalable kernel learning.
📝 Abstract
Kernel ridge regression, in general, is expensive in memory allocation and computation time. This paper addresses low rank approximations and surrogates for kernel ridge regression, which bridge these difficulties. The fundamental contribution of the paper is a lower bound on the minimal rank such that the prediction power of the approximation remains reliable. Based on this bound, we demonstrate that the computational cost of the most popular low rank approach, which is the Nystr""om method, is almost linear in the sample size. This justifies the method from a theoretical point of view. Moreover, the paper provides a significant extension of the feasible choices of the regularization parameter. The result builds on a thorough theoretical analysis of the approximation of elementary kernel functions by elements in the range of the associated integral operator. We provide estimates of the approximation error and characterize the behavior of the norm of the underlying weight function.