Massively Parallel Expectation Maximization For Approximate Posteriors

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

To address the scalability limitations of MCMC in large-scale hierarchical Bayesian models, and the slow convergence and reparameterization sensitivity of variational inference (VI) and reweighted wake-sleep (RWS), this paper proposes QEM—a fast posterior inference method grounded in the expectation-maximization (EM) framework. In the E-step, QEM employs massively parallel importance-weighted estimation of posterior moments; in the M-step, it analytically fits conjugate distributions—including Gaussian, Gamma, Beta, and Dirichlet families—via moment matching, thereby avoiding gradient-based optimization and reparameterization. Its key innovation lies in the first deep integration of parallel importance weighting with the EM paradigm, enabling reparameterization-invariant, closed-form posterior learning. Experiments demonstrate that QEM achieves significantly faster convergence than parallel RWS and VI, while maintaining both computational efficiency and high posterior accuracy across multiple hierarchical models.

Technology Category

Application Category

📝 Abstract

Bayesian inference for hierarchical models can be very challenging. MCMC methods have difficulty scaling to large models with many observations and latent variables. While variational inference (VI) and reweighted wake-sleep (RWS) can be more scalable, they are gradient-based methods and so often require many iterations to converge. Our key insight was that modern massively parallel importance weighting methods (Bowyer et al., 2024) give fast and accurate posterior moment estimates, and we can use these moment estimates to rapidly learn an approximate posterior. Specifically, we propose using expectation maximization to fit the approximate posterior, which we call QEM. The expectation step involves computing the posterior moments using high-quality massively parallel estimates from Bowyer et al. (2024). The maximization step involves fitting the approximate posterior using these moments, which can be done straightforwardly for simple approximate posteriors such as Gaussian, Gamma, Beta, Dirichlet, Binomial, Multinomial, Categorical, etc. (or combinations thereof). We show that QEM is faster than state-of-the-art, massively parallel variants of RWS and VI, and is invariant to reparameterizations of the model that dramatically slow down gradient based methods.

Problem

Research questions and friction points this paper is trying to address.

Scalable Bayesian inference for hierarchical models with many observations and latent variables.

Overcoming slow convergence of gradient-based variational inference and reweighted wake-sleep methods.

Fast and accurate posterior moment estimation using massively parallel importance weighting.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Massively parallel importance weighting for fast posterior estimates

Expectation maximization to fit approximate posteriors (QEM)

High-quality moment estimates for simple posterior distributions

🔎 Similar Papers

Scalable Bayesian Learning with posteriors