🤖 AI Summary
This paper addresses online learning of nonlinear operators mapping from Polish spaces to separable Hilbert spaces. We propose a regularized stochastic gradient descent (RSGD) algorithm based on operator-valued kernels and vector-valued reproducing kernel Hilbert spaces (vv-RKHS), achieving efficient estimation with linear computational complexity. Our key contribution is a dimension-free convergence analysis framework: for general RSGD schemes, we derive high-probability bounds on both prediction and estimation errors, as well as almost-sure convergence—both established for the first time. Near-optimal convergence rates are attained under either polynomially decaying or constant step sizes/regularization parameters. The theoretical guarantees rely on assumptions of operator smoothness and structural regularity, and naturally extend to encoder–decoder architectures. This work establishes a new paradigm for operator learning that balances theoretical rigor with practical applicability.
📝 Abstract
This paper investigates regularized stochastic gradient descent (SGD) algorithms for estimating nonlinear operators from a Polish space to a separable Hilbert space. We assume that the regression operator lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. Two significant settings are considered: an online setting with polynomially decaying step sizes and regularization parameters, and a finite-horizon setting with constant step sizes and regularization parameters. We introduce regularity conditions on the structure and smoothness of the target operator and the input random variables. Under these conditions, we provide a dimension-free convergence analysis for the prediction and estimation errors, deriving both expectation and high-probability error bounds. Our analysis demonstrates that these convergence rates are nearly optimal. Furthermore, we present a new technique for deriving bounds with high probability for general SGD schemes, which also ensures almost-sure convergence. Finally, we discuss potential extensions to more general operator-valued kernels and the encoder-decoder framework.