Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses online learning of nonlinear operators mapping from Polish spaces to separable Hilbert spaces. We propose a regularized stochastic gradient descent (RSGD) algorithm based on operator-valued kernels and vector-valued reproducing kernel Hilbert spaces (vv-RKHS), achieving efficient estimation with linear computational complexity. Our key contribution is a dimension-free convergence analysis framework: for general RSGD schemes, we derive high-probability bounds on both prediction and estimation errors, as well as almost-sure convergence—both established for the first time. Near-optimal convergence rates are attained under either polynomially decaying or constant step sizes/regularization parameters. The theoretical guarantees rely on assumptions of operator smoothness and structural regularity, and naturally extend to encoder–decoder architectures. This work establishes a new paradigm for operator learning that balances theoretical rigor with practical applicability.

Technology Category

Application Category

📝 Abstract
This paper investigates regularized stochastic gradient descent (SGD) algorithms for estimating nonlinear operators from a Polish space to a separable Hilbert space. We assume that the regression operator lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. Two significant settings are considered: an online setting with polynomially decaying step sizes and regularization parameters, and a finite-horizon setting with constant step sizes and regularization parameters. We introduce regularity conditions on the structure and smoothness of the target operator and the input random variables. Under these conditions, we provide a dimension-free convergence analysis for the prediction and estimation errors, deriving both expectation and high-probability error bounds. Our analysis demonstrates that these convergence rates are nearly optimal. Furthermore, we present a new technique for deriving bounds with high probability for general SGD schemes, which also ensures almost-sure convergence. Finally, we discuss potential extensions to more general operator-valued kernels and the encoder-decoder framework.
Problem

Research questions and friction points this paper is trying to address.

Estimating nonlinear operators using regularized stochastic gradient descent
Analyzing convergence rates for prediction and estimation errors
Extending analysis to general operator-valued kernels and encoder-decoder frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Regularized SGD with operator-valued kernels
Online and finite-horizon learning settings
High-probability error bounds technique
Jia-Qi Yang
Jia-Qi Yang
ByteDance
machine learningdata miningrecommender systems
L
Lei Shi
School of Mathematical Sciences, Fudan University, Shanghai, 200433, China; Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai, 200433, China