🤖 AI Summary
Addressing core challenges in Bayesian neural networks (BNNs)—including difficulty in posterior sampling, slow convergence, and unpredictable computational resource consumption—this paper proposes the Microcanonical Regularized Langevin Integration (MRLI) method. MRLI integrates microcanonical statistical physics modeling with an enhanced multi-chain Langevin Monte Carlo (MCLMC) sampler, incorporating adaptive step-size control and parallelization to significantly improve sampling stability and scalability. It is the first approach to achieve dual predictability—both in BNN posterior sampling performance and in computational resource usage. Evaluated on diverse multi-task and multimodal benchmarks, MRLI achieves up to 10× speedup over the No-U-Turn Sampler (NUTS) while preserving or improving predictive accuracy and uncertainty calibration. This advancement substantially lowers the practical deployment barrier for BNNs in real-world applications.
📝 Abstract
Despite recent advances, sampling-based inference for Bayesian Neural Networks (BNNs) remains a significant challenge in probabilistic deep learning. While sampling-based approaches do not require a variational distribution assumption, current state-of-the-art samplers still struggle to navigate the complex and highly multimodal posteriors of BNNs. As a consequence, sampling still requires considerably longer inference times than non-Bayesian methods even for small neural networks, despite recent advances in making software implementations more efficient. Besides the difficulty of finding high-probability regions, the time until samplers provide sufficient exploration of these areas remains unpredictable. To tackle these challenges, we introduce an ensembling approach that leverages strategies from optimization and a recently proposed sampler called Microcanonical Langevin Monte Carlo (MCLMC) for efficient, robust and predictable sampling performance. Compared to approaches based on the state-of-the-art No-U-Turn Sampler, our approach delivers substantial speedups up to an order of magnitude, while maintaining or improving predictive performance and uncertainty quantification across diverse tasks and data modalities. The suggested Microcanonical Langevin Ensembles and modifications to MCLMC additionally enhance the method's predictability in resource requirements, facilitating easier parallelization. All in all, the proposed method offers a promising direction for practical, scalable inference for BNNs.