🤖 AI Summary
Existing deep learning theory predominantly focuses on the ReLU activation function, leaving systematic characterization of nonsmooth activations—such as SELU, ELU, and LeakyReLU—under the neural tangent kernel (NTK) and neural network Gaussian process (NNGP) frameworks largely unaddressed, especially regarding their reproducing kernel Hilbert space (RKHS) structure and depth equivalence. Method: We propose a unified analytical framework integrating functional analysis and stochastic process theory to characterize kernel-space properties of zero-nonsmooth activations in the wide-network limit. Contribution/Results: We prove that most non-infinitely-differentiable activations induce depth-equivalent RKHSs across network depths, whereas polynomial activations do not satisfy this property. Moreover, we provide the first precise characterization of sample-path smoothness for NNGP kernels induced by such activations. Our results extend the theoretical scope of NTK/NNGP analysis and establish a more general foundation for studying generalization and regularity in wide neural networks.
📝 Abstract
While the theory of deep learning has made some progress in recent years, much of it is limited to the ReLU activation function. In particular, while the neural tangent kernel (NTK) and neural network Gaussian process kernel (NNGP) have given theoreticians tractable limiting cases of fully connected neural networks, their properties for most activation functions except for powers of the ReLU function are poorly understood. Our main contribution is to provide a more general characterization of the RKHS of these kernels for typical activation functions whose only non-smoothness is at zero, such as SELU, ELU, or LeakyReLU. Our analysis also covers a broad set of special cases such as missing biases, two-layer networks, or polynomial activations. Our results show that a broad class of not infinitely smooth activations generate equivalent RKHSs at different network depths, while polynomial activations generate non-equivalent RKHSs. Finally, we derive results for the smoothness of NNGP sample paths, characterizing the smoothness of infinitely wide neural networks at initialization.