Large Deviations of Gaussian Neural Networks with ReLU activation

📅 2024-05-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
This work investigates the large-deviation behavior of deep neural networks with Gaussian i.i.d. weights under linearly growing (unbounded) activation functions—specifically ReLU. Addressing the limitation of existing large-deviation theory, which applies only to bounded continuous activations, we establish the first rigorous large-deviation principle for the ReLU case. Our method integrates tools from random matrix theory, Gaussian process analysis, and power series expansions to derive a concise, closed-form rate function. Crucially, we obtain an explicit power series representation of this rate function tailored to ReLU. The theoretical results align closely with empirical observations in modern deep learning architectures. This framework provides a novel analytical tool for quantitatively characterizing neural network generalization, informing principled weight initialization schemes, and elucidating training dynamics—thereby advancing the theoretical foundations of deep learning.

Technology Category

Application Category

📝 Abstract
We prove a large deviation principle for deep neural networks with Gaussian weights and (at most linearly growing) activation functions. This generalises earlier work, in which bounded and continuous activation functions were considered. In practice, linearly growing activation functions such as ReLU are most commonly used. We furthermore simplify previous expressions for the rate function and a give power-series expansions for the ReLU case.
Problem

Research questions and friction points this paper is trying to address.

Studies large deviations in Gaussian neural networks with ReLU
Generalizes prior work on bounded activation functions
Simplifies rate function expressions for ReLU networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves large deviations for Gaussian ReLU networks
Simplifies rate function expressions
Provides power-series expansions for ReLU
🔎 Similar Papers
No similar papers found.
Q
Quirin Vogel
Ludwig-Maximilians-Universität München, Mathematisches Institut, München, Germany