🤖 AI Summary
Sampling from multimodal distributions with well-separated modes suffers from exponentially slow inter-mode mixing. Method: We propose Reweighted Adaptive Parallel Tempering Sampling (ALPS), which constructs a tilted warm-start distribution using pre-identified mode centers, integrates cross-modal teleportation jumps, and employs component-wise Monte Carlo estimates of partition functions—bypassing Hessian computations and Gaussian approximations. Contribution/Results: We establish the first polynomial-time, non-asymptotic convergence bound for parallel tempering under general conditions. Our analysis of partially stationary distributions ensures efficient inter-mode transitions. Experiments demonstrate that the proposed method significantly outperforms standard ALPS on heavy-tailed mixture distributions, achieving substantially improved mixing efficiency.
📝 Abstract
Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling -- classical MCMC methods, even with tempering, can suffer from exponential mixing times -- a natural question is how to leverage additional information, such as a warm start point for each mode, to enable faster mixing across modes. To address this, we introduce Reweighted ALPS (Re-ALPS), a modified version of the Annealed Leap-Point Sampler (ALPS) that dispenses with the Gaussian approximation assumption. We prove the first polynomial-time bound that works in a general setting, under a natural assumption that each component contains significant mass relative to the others when tilted towards the corresponding warm start point. Similarly to ALPS, we define distributions tilted towards a mixture centered at the warm start points, and at the coldest level, use teleportation between warm start points to enable efficient mixing across modes. In contrast to ALPS, our method does not require Hessian information at the modes, but instead estimates component partition functions via Monte Carlo. This additional estimation step is crucial in allowing the algorithm to handle target distributions with more complex geometries besides approximate Gaussian. For the proof, we show convergence results for Markov processes when only part of the stationary distribution is well-mixing and estimation for partition functions for individual components of a mixture. We numerically evaluate our algorithm's mixing performance compared to ALPS on a mixture of heavy-tailed distributions.