Controlled Langevin Dynamics for Sampling of Feedforward Neural Networks Trained with Minibatches

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the computational intractability of exact sampling methods—such as Hybrid Monte Carlo—for large-scale neural networks, which stems from their reliance on full-batch gradient evaluations to sample from the Boltzmann distribution. The authors propose a controlled minibatch pseudo-Langevin dynamics approach that leverages the statistical properties of minibatch gradient noise, adjusting virtual mass and friction coefficients to dramatically improve computational efficiency while preserving sampling accuracy. Grounded in stochastic differential equations and equilibrium distribution theory, the method enables scalable sampling in high-dimensional parameter spaces. Experiments demonstrate that on networks with millions of parameters, the samples generated at moderate temperatures achieve generalization performance comparable to that of stochastic gradient descent, without requiring a validation set or early stopping.

Technology Category

Application Category

📝 Abstract
Sampling the parameter space of artificial neural networks according to a Boltzmann distribution provides insight into the geometry of low-loss solutions and offers an alternative to conventional loss minimization for training. However, exact sampling methods such as hybrid Monte Carlo (hMC), while formally correct, become computationally prohibitive for realistic datasets because they require repeated evaluation of full-batch gradients. We introduce a pseudo-Langevin (pL) dynamics that enables efficient Boltzmann sampling of feed-forward neural networks trained with large datasets by using minibatches in a controlled manner. The method exploits the statistical properties of minibatch gradient noise and adjusts fictitious masses and friction coefficients to ensure that the induced stochastic process samples efficiently the desired equilibrium distribution. We validate numerically the approach by comparing its equilibrium statistics with those obtained from exact hMC sampling. Performance benchmarks demonstrate that, while hMC rapidly becomes inefficient as network size increases, the pL scheme maintains high computational diffusion and scales favorably to networks with over one million parameters. Finally, we show that sampling at intermediate temperatures yields optimal generalization performance, comparable to SGD, without requiring a validation set or early stopping procedure. These results establish controlled minibatch Langevin dynamics as a practical and scalable tool for exploring and exploiting the solution space of large neural networks.
Problem

Research questions and friction points this paper is trying to address.

Boltzmann sampling
neural networks
minibatch gradients
computational efficiency
parameter space exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

pseudo-Langevin dynamics
minibatch sampling
Boltzmann distribution
scalable neural network sampling
controlled Langevin dynamics
🔎 Similar Papers
No similar papers found.
A
Alessandro Zambon
Department of Physics, Università degli Studi di Milano and INFN, via Celoria 16, 20133 Milano, Italy
F
Francesca Caruso
Department of Computing Sciences and Bocconi Institute for Data Science and Analytics (BIDSA), Bocconi University, 20136 Milano, Italy
Riccardo Zecchina
Riccardo Zecchina
professor, theoretical physics, Bocconi University
statistical physicsoptimisation and inferencemachine learningcomputational biologycomputational neuroscience
Guido Tiana
Guido Tiana
University of Milano
protein foldingcomputational physicsphysics of complex systemsprotein aggregationmolecular evolution