🤖 AI Summary
To address the challenge of applying gradient-based MCMC methods for Bayesian posterior sampling under non-differentiable priors, this paper proposes a novel framework that constructs a smooth importance distribution via the Moreau–Yosida (MY) envelope. It is the first work to directly employ the MY envelope as an importance distribution, enabling gradient-driven importance sampling. We rigorously establish the asymptotic normality of the resulting estimator and derive an explicit closed-form expression for its asymptotic covariance matrix. Furthermore, we obtain sufficient conditions for geometric ergodicity of the Metropolis–Adjusted Langevin Algorithm (MALA) and Hamiltonian Monte Carlo (HMC) when targeting this smoothed distribution. Theoretical guarantees are complemented by empirical validation: experiments in both high- and low-dimensional settings demonstrate substantial variance reduction compared to state-of-the-art proximal MCMC methods, confirming both strong theoretical foundations and practical robustness.
📝 Abstract
The use of non-differentiable priors is standard in modern parsimonious Bayesian models. Lack of differentiability, however, precludes gradient-based Markov chain Monte Carlo (MCMC) methods for posterior sampling. Recently proposed proximal MCMC approaches can partially remedy this limitation. These approaches use gradients of a smooth approximation, constructed via Moreau-Yosida (MY) envelopes, to make proposals. In this work, we build an importance sampling paradigm by using the MY envelope as an importance distribution. Leveraging properties of the envelope, we establish asymptotic normality of the importance sampling estimator with an explicit expression for the asymptotic covariance matrix. Since the MY envelope density is smooth, it is amenable to gradient-based samplers. We provide sufficient conditions for geometric ergodicity of Metropolis-adjusted Langevin and Hamiltonian Monte Carlo algorithms, sampling from this importance distribution. A variety of numerical studies show that the proposed scheme can yield lower variance estimators compared to existing proximal MCMC alternatives, and is effective in both low and high dimensions.