🤖 AI Summary
This work addresses the scalability limitations of microcanonical Langevin dynamics in large-scale Bayesian deep learning, which stem from its reliance on full-batch gradients and an unclear mechanism for leveraging minibatch gradient noise. The authors develop a continuous-time theoretical framework for stochastic gradient microcanonical dynamics, revealing bias induced by anisotropic noise and numerical instabilities in high-dimensional posterior distributions. To mitigate these issues, they propose gradient-noise preconditioning to reduce bias and introduce an adaptive tuner based on energy variance that automatically selects step sizes while ensuring numerical stability. The resulting SMILE sampler achieves efficient and robust inference in high-dimensional settings such as Bayesian neural networks, demonstrating state-of-the-art performance.
📝 Abstract
Scaling inference methods such as Markov chain Monte Carlo to high-dimensional models remains a central challenge in Bayesian deep learning. A promising recent proposal, microcanonical Langevin Monte Carlo, has shown state-of-the-art performance across a wide range of problems. However, its reliance on full-dataset gradients makes it prohibitively expensive for large-scale problems. This paper addresses a fundamental question: Can microcanonical dynamics effectively leverage mini-batch gradient noise? We provide the first systematic study of this problem, establishing a novel continuous-time theoretical analysis of stochastic-gradient microcanonical dynamics. We reveal two critical failure modes: a theoretically derived bias due to anisotropic gradient noise and numerical instabilities in complex high-dimensional posteriors. To tackle these issues, we propose a principled gradient noise preconditioning scheme shown to significantly reduce this bias and develop a novel, energy-variance-based adaptive tuner that automates step size selection and dynamically informs numerical guardrails. The resulting algorithm is a robust and scalable microcanonical Monte Carlo sampler that achieves state-of-the-art performance on challenging high-dimensional inference tasks like Bayesian neural networks. Combined with recent ensemble techniques, our work unlocks a new class of stochastic microcanonical Langevin ensemble (SMILE) samplers for large-scale Bayesian inference.