Phase Diagram of Dropout for Two-Layer Neural Networks in the Mean-Field Regime

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work investigates the asymptotic dynamics of dropout in two-layer neural networks under the large-width limit, focusing on the coupled phase-transition mechanism among dropout rate, learning rate, and network width. Employing a mean-field framework, we model neuron updates as independent Poisson/Bernoulli clocks and rigorously derive limiting evolution equations—both in path space and distribution space—using particle systems and stochastic process theory. We identify five non-degenerate training phases for the first time, revealing that classical regularization effects occur only at vanishingly small learning rates; as the learning rate increases, dropout transitions into a stochastic geometric optimization method. Our analysis establishes the first parameterized phase diagram theory for dropout in large-scale networks, precisely characterizing fundamental boundaries of convergence and dynamical behavior across scaling regimes.

Technology Category

Application Category

📝 Abstract

Dropout is a standard training technique for neural networks that consists of randomly deactivating units at each step of their gradient-based training. It is known to improve performance in many settings, including in the large-scale training of language or vision models. As a first step towards understanding the role of dropout in large neural networks, we study the large-width asymptotics of gradient descent with dropout on two-layer neural networks with the mean-field initialization scale. We obtain a rich asymptotic phase diagram that exhibits five distinct nondegenerate phases depending on the relative magnitudes of the dropout rate, the learning rate, and the width. Notably, we find that the well-studied "penalty" effect of dropout only persists in the limit with impractically small learning rates of order $O(1/ ext{width})$. For larger learning rates, this effect disappears and in the limit, dropout is equivalent to a "random geometry" technique, where the gradients are thinned randomly after the forward and backward pass have been computed. In this asymptotic regime, the limit is described by a mean-field jump process where the neurons' update times follow independent Poisson or Bernoulli clocks (depending on whether the learning rate vanishes or not). For some of the phases, we obtain a description of the limit dynamics both in path-space and in distribution-space. The convergence proofs involve a mix of tools from mean-field particle systems and stochastic processes. Together, our results lay the groundwork for a renewed theoretical understanding of dropout in large-scale neural networks.

Problem

Research questions and friction points this paper is trying to address.

Analyzing dropout's phase diagram in two-layer neural networks under mean-field regime

Identifying five distinct training phases based on dropout rate and learning rate

Characterizing dropout as random geometry technique at practical learning rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mean-field analysis of dropout in two-layer networks

Phase diagram with five distinct training regimes

Dropout equivalent to random geometry at large learning rates

🔎 Similar Papers

No similar papers found.