Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing deep learning theory predominantly focuses on the ReLU activation function, leaving systematic characterization of nonsmooth activations—such as SELU, ELU, and LeakyReLU—under the neural tangent kernel (NTK) and neural network Gaussian process (NNGP) frameworks largely unaddressed, especially regarding their reproducing kernel Hilbert space (RKHS) structure and depth equivalence. Method: We propose a unified analytical framework integrating functional analysis and stochastic process theory to characterize kernel-space properties of zero-nonsmooth activations in the wide-network limit. Contribution/Results: We prove that most non-infinitely-differentiable activations induce depth-equivalent RKHSs across network depths, whereas polynomial activations do not satisfy this property. Moreover, we provide the first precise characterization of sample-path smoothness for NNGP kernels induced by such activations. Our results extend the theoretical scope of NTK/NNGP analysis and establish a more general foundation for studying generalization and regularity in wide neural networks.

Technology Category

Application Category

📝 Abstract

While the theory of deep learning has made some progress in recent years, much of it is limited to the ReLU activation function. In particular, while the neural tangent kernel (NTK) and neural network Gaussian process kernel (NNGP) have given theoreticians tractable limiting cases of fully connected neural networks, their properties for most activation functions except for powers of the ReLU function are poorly understood. Our main contribution is to provide a more general characterization of the RKHS of these kernels for typical activation functions whose only non-smoothness is at zero, such as SELU, ELU, or LeakyReLU. Our analysis also covers a broad set of special cases such as missing biases, two-layer networks, or polynomial activations. Our results show that a broad class of not infinitely smooth activations generate equivalent RKHSs at different network depths, while polynomial activations generate non-equivalent RKHSs. Finally, we derive results for the smoothness of NNGP sample paths, characterizing the smoothness of infinitely wide neural networks at initialization.

Problem

Research questions and friction points this paper is trying to address.

Generalizing neural kernel theory beyond ReLU activations

Analyzing RKHS properties for non-smooth activation functions

Characterizing NNGP sample path smoothness in wide networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizes NTK and NNGP for non-ReLU activations

Analyzes RKHS equivalence for various activations

Characterizes smoothness of NNGP sample paths

🔎 Similar Papers

No similar papers found.

Authors to Follow