Large Deviations of Gaussian Neural Networks with ReLU activation

📅 2024-05-27

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

237K/year

🤖 AI Summary

This work investigates the large-deviation behavior of deep neural networks with Gaussian i.i.d. weights under linearly growing (unbounded) activation functions—specifically ReLU. Addressing the limitation of existing large-deviation theory, which applies only to bounded continuous activations, we establish the first rigorous large-deviation principle for the ReLU case. Our method integrates tools from random matrix theory, Gaussian process analysis, and power series expansions to derive a concise, closed-form rate function. Crucially, we obtain an explicit power series representation of this rate function tailored to ReLU. The theoretical results align closely with empirical observations in modern deep learning architectures. This framework provides a novel analytical tool for quantitatively characterizing neural network generalization, informing principled weight initialization schemes, and elucidating training dynamics—thereby advancing the theoretical foundations of deep learning.

Technology Category

Application Category

📝 Abstract

We prove a large deviation principle for deep neural networks with Gaussian weights and (at most linearly growing) activation functions. This generalises earlier work, in which bounded and continuous activation functions were considered. In practice, linearly growing activation functions such as ReLU are most commonly used. We furthermore simplify previous expressions for the rate function and a give power-series expansions for the ReLU case.

Problem

Research questions and friction points this paper is trying to address.

Studies large deviations in Gaussian neural networks with ReLU

Generalizes prior work on bounded activation functions

Simplifies rate function expressions for ReLU networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves large deviations for Gaussian ReLU networks

Simplifies rate function expressions

Provides power-series expansions for ReLU

🔎 Similar Papers

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions