🤖 AI Summary
To address the scalability challenges of Bayesian learning under big data and large models—stemming from high-dimensional posterior approximation—this paper proposes a scalable Bayesian inference framework. Methodologically, it introduces a novel tempered stochastic gradient MCMC perspective, theoretically establishing the asymptotic unbiasedness of deep ensembles. It further provides the first systematic empirical validation of the cold posterior effect in large language models (LLMs), demonstrating improved uncertainty calibration and robustness via Bayesian approximation. Finally, it develops Posteriors, an open-source PyTorch library implementing a unified optimization-and-sampling paradigm, enabling efficient Bayesian inference for models with up to thousands of layers. Experiments across multiple benchmarks and LLM tasks show significant gains in predictive uncertainty calibration and out-of-distribution robustness.
📝 Abstract
Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes; (ii) present a tempered framing of stochastic gradient Markov chain Monte Carlo, as implemented in posteriors, that transitions seamlessly into optimization and unveils a minor modification to deep ensembles to ensure they are asymptotically unbiased for the Bayesian posterior, and (iii) demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models.