🤖 AI Summary
This work addresses the challenge of achieving implicit complexity control in learning algorithms without relying on explicit regularization terms. It proposes a self-regularized learning framework that implicitly constrains predictors through the complexity of the simplest comparator, offering a unified characterization of the generalization behavior of algorithms such as gradient descent. The framework encompasses both classical regularization methods and implicit regularization mechanisms, and further enables data-driven hyperparameter selection. Theoretically, the authors establish a general self-regularization theory, derive minimax optimal convergence rates, and—by integrating early stopping within reproducing kernel Hilbert spaces (RKHS)—provide the first theoretical guarantee for data-dependent early stopping strategies.
📝 Abstract
We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.