SMC Is All You Need: Parallel Strong Scaling

📅 2024-02-09
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the unbounded time complexity and poor parallel scalability of conventional SMC/MCMC methods in Bayesian deep learning, this paper proposes a fully parallelized Sequential Monte Carlo (pSMC) framework. Our method achieves theoretically guaranteed strong scalability: mean squared error (MSE) scales as $O(1/NP)$, preserving constant per-step computational cost and zero efficiency loss as the number of processors $P o infty$. Key innovations include asynchronous distributed SMC, adaptive resampling, optimized inter-node sample communication, and rigorous convergence analysis. Experiments across multiple Bayesian inference tasks demonstrate that pSMC significantly outperforms state-of-the-art MCMC methods, attaining the optimal $O(varepsilon^{-2})$ computational complexity for $varepsilon$-accurate estimation, with empirically stable strong scaling behavior.

Technology Category

Application Category

📝 Abstract
The Bayesian posterior distribution can only be evaluated up-to a constant of proportionality, which makes simulation and consistent estimation challenging. Classical consistent Bayesian methods such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) have unbounded time complexity requirements. We develop a fully parallel sequential Monte Carlo (pSMC) method which provably delivers parallel strong scaling, i.e. the time complexity (and per-node memory) remains bounded if the number of asynchronous processes is allowed to grow. More precisely, the pSMC has a theoretical convergence rate of Mean Square Error (MSE)$ = O(1/NP)$, where $N$ denotes the number of communicating samples in each processor and $P$ denotes the number of processors. In particular, for suitably-large problem-dependent $N$, as $P ightarrow infty$ the method converges to infinitesimal accuracy MSE$=O(varepsilon^2)$ with a fixed finite time-complexity Cost$=O(1)$ and with no efficiency leakage, i.e. computational complexity Cost$=O(varepsilon^{-2})$. A number of Bayesian inference problems are taken into consideration to compare the pSMC and MCMC methods.
Problem

Research questions and friction points this paper is trying to address.

Compares parallel SMC and MCMC for Bayesian deep learning
Analyzes convergence properties and communication costs of parallel algorithms
Evaluates performance on MNIST, CIFAR, and IMDb datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel SMC sampler with convergence proof
MCMC parallel chains with bias control
Empirical comparison on MNIST CIFAR IMDb
🔎 Similar Papers
No similar papers found.
X
Xinzhu Liang
School of Mathematics, University of Manchester, Manchester, M13 9PL, United Kingdom
S
Sanjaya Lohani
Department of Electrical and Computer Engineering, University of Illinois Chicago, Chicago, Illinois 60607, USA
Joseph M. Lukens
Joseph M. Lukens
Purdue University; Oak Ridge National Laboratory
quantum informationphotonicsquantum networkinglightwave communicationsBayesian infererence
Brian T. Kirby
Brian T. Kirby
US Army Research Laboratory
Quantum Information
T
Thomas A. Searles
Department of Electrical and Computer Engineering, University of Illinois Chicago, Chicago, Illinois 60607, USA
Kody J. H. Law
Kody J. H. Law
Professor at the University of Manchester and AI Research Scientist at Meta
AIMachine LearningComputational StatisticsComputational Applied Mathematics