🤖 AI Summary
This work investigates the large-deviation behavior of deep neural networks with Gaussian i.i.d. weights under linearly growing (unbounded) activation functions—specifically ReLU. Addressing the limitation of existing large-deviation theory, which applies only to bounded continuous activations, we establish the first rigorous large-deviation principle for the ReLU case. Our method integrates tools from random matrix theory, Gaussian process analysis, and power series expansions to derive a concise, closed-form rate function. Crucially, we obtain an explicit power series representation of this rate function tailored to ReLU. The theoretical results align closely with empirical observations in modern deep learning architectures. This framework provides a novel analytical tool for quantitatively characterizing neural network generalization, informing principled weight initialization schemes, and elucidating training dynamics—thereby advancing the theoretical foundations of deep learning.
📝 Abstract
We prove a large deviation principle for deep neural networks with Gaussian weights and (at most linearly growing) activation functions. This generalises earlier work, in which bounded and continuous activation functions were considered. In practice, linearly growing activation functions such as ReLU are most commonly used. We furthermore simplify previous expressions for the rate function and a give power-series expansions for the ReLU case.