🤖 AI Summary
Activation function selection has long relied on heuristic experience without rigorous theoretical foundations, particularly regarding stability and expressivity trade-offs.
Method: We propose the first nine-dimensional integral feature system unifying stability and kernel properties to jointly characterize expressivity and dynamical stability. Our framework integrates Gaussian propagation statistics, Lyapunov dynamical analysis, dimension-free Hessian bounds, and total variation-based smoothness measures, yielding an affine reparameterization-invariant classification scheme with provable dynamic stability guarantees.
Results: The theory identifies a precise variance-stabilizing region, enabling sharp categorization of saturation-type, linearly growing, and smooth activation functions. Classification of eight mainstream activations—including ReLU, Swish, and GELU—aligns closely with Gauss–Hermite quadrature and Monte Carlo numerical validation. This work establishes the first theoretically grounded, provably stable criterion for activation function selection.
📝 Abstract
Activation functions govern the expressivity and stability of neural networks, yet existing comparisons remain largely heuristic. We propose a rigorous framework for their classification via a nine-dimensional integral signature S_sigma(phi), combining Gaussian propagation statistics (m1, g1, g2, m2, eta), asymptotic slopes (alpha_plus, alpha_minus), and regularity measures (TV(phi'), C(phi)). This taxonomy establishes well-posedness, affine reparameterization laws with bias, and closure under bounded slope variation. Dynamical analysis yields Lyapunov theorems with explicit descent constants and identifies variance stability regions through (m2', g2). From a kernel perspective, we derive dimension-free Hessian bounds and connect smoothness to bounded variation of phi'. Applying the framework, we classify eight standard activations (ReLU, leaky-ReLU, tanh, sigmoid, Swish, GELU, Mish, TeLU), proving sharp distinctions between saturating, linear-growth, and smooth families. Numerical Gauss-Hermite and Monte Carlo validation confirms theoretical predictions. Our framework provides principled design guidance, moving activation choice from trial-and-error to provable stability and kernel conditioning.