SMC Is All You Need: Parallel Strong Scaling

📅 2024-02-09

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

270K/year

🤖 AI Summary

To address the unbounded time complexity and poor parallel scalability of conventional SMC/MCMC methods in Bayesian deep learning, this paper proposes a fully parallelized Sequential Monte Carlo (pSMC) framework. Our method achieves theoretically guaranteed strong scalability: mean squared error (MSE) scales as $O(1/NP)$, preserving constant per-step computational cost and zero efficiency loss as the number of processors $P o infty$. Key innovations include asynchronous distributed SMC, adaptive resampling, optimized inter-node sample communication, and rigorous convergence analysis. Experiments across multiple Bayesian inference tasks demonstrate that pSMC significantly outperforms state-of-the-art MCMC methods, attaining the optimal $O(varepsilon^{-2})$ computational complexity for $varepsilon$-accurate estimation, with empirically stable strong scaling behavior.

Technology Category

Application Category

📝 Abstract

The Bayesian posterior distribution can only be evaluated up-to a constant of proportionality, which makes simulation and consistent estimation challenging. Classical consistent Bayesian methods such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) have unbounded time complexity requirements. We develop a fully parallel sequential Monte Carlo (pSMC) method which provably delivers parallel strong scaling, i.e. the time complexity (and per-node memory) remains bounded if the number of asynchronous processes is allowed to grow. More precisely, the pSMC has a theoretical convergence rate of Mean Square Error (MSE)$ = O(1/NP)$, where $N$ denotes the number of communicating samples in each processor and $P$ denotes the number of processors. In particular, for suitably-large problem-dependent $N$, as $P ightarrow infty$ the method converges to infinitesimal accuracy MSE$=O(varepsilon^2)$ with a fixed finite time-complexity Cost$=O(1)$ and with no efficiency leakage, i.e. computational complexity Cost$=O(varepsilon^{-2})$. A number of Bayesian inference problems are taken into consideration to compare the pSMC and MCMC methods.

Problem

Research questions and friction points this paper is trying to address.

Compares parallel SMC and MCMC for Bayesian deep learning

Analyzes convergence properties and communication costs of parallel algorithms

Evaluates performance on MNIST, CIFAR, and IMDb datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel SMC sampler with convergence proof

MCMC parallel chains with bias control

Empirical comparison on MNIST CIFAR IMDb

🔎 Similar Papers

No similar papers found.