🤖 AI Summary
This work addresses the lack of explicit characterizations of Lipschitz constants for feature maps induced by integral kernels—a gap that hinders robustness and stability guarantees in kernel methods. Building on functional analysis, probability integral transforms, and kernel theory, the study investigates the Lipschitz regularity of such feature maps under differentiability conditions, establishing sufficient conditions for continuity and deriving explicit formulas for the associated constants. For the first time, closed-form expressions of Lipschitz constants are provided for Gaussian kernels, ReLU random neural network kernels, and translation-invariant kernels with cosine activation, revealing an equivalence between this Lipschitz property and the existence of the second moment of the weight distribution. Numerical experiments confirm the convergence behavior of these constants in finite-width networks, and the paper concludes by posing open questions regarding their asymptotic properties.
📝 Abstract
Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Lipschitz continuity are closely related to robustness and stability guarantees. Despite their importance, explicit characterizations of the Lipschitz constant of kernel feature maps are available only in a limited number of cases. In this paper, we study the Lipschitz regularity of feature maps associated with integral kernels under differentiability assumptions. We first provide sufficient conditions ensuring Lipschitz continuity and derive explicit formulas for the corresponding Lipschitz constants. We then identify a condition under which the feature map fails to be Lipschitz continuous and apply these results to several important classes of kernels. For infinite width two-layer neural network with isotropic Gaussian weight distributions, we show that the Lipschitz constant of the associated kernel can be expressed as the supremum of a two-dimensional integral, leading to an explicit characterization for the Gaussian kernel and the ReLU random neural network kernel. We also study continuous and shift-invariant kernels such as Gaussian, Laplace, and Matérn kernels, which admit an interpretation as neural network with cosine activation function. In this setting, we prove that the feature map is Lipschitz continuous if and only if the weight distribution has a finite second-order moment, and we then derive its Lipschitz constant. Finally, we raise an open question concerning the asymptotic behavior of the convergence of the Lipschitz constant in finite width neural networks. Numerical experiments are provided to support this behavior.