🤖 AI Summary
This work addresses the long-standing lack of theoretical guarantees for the convergence of unadjusted Hamiltonian Monte Carlo (uHMC) under strong information divergences such as KL and Rényi divergences. By leveraging one-shot coupling and kernel regularization analysis, the study systematically extends mixing time and bias bounds—previously established in Wasserstein-2 and Orlicz–Wasserstein metrics—to these stronger divergences, thereby establishing tail-sensitive convergence bounds for the uHMC transition kernel. The analysis not only quantifies the impact of discretization bias on KL and Rényi divergences but also provides the first explicit theoretical foundation for warm-starting both unadjusted samplers and Metropolis-adjusted chains.
📝 Abstract
Hamiltonian Monte Carlo (HMC) algorithms are among the most widely used sampling methods in high dimensional settings, yet their convergence properties are poorly understood in divergences that quantify relative density mismatch, such as Kullback-Leibler (KL) and R\'enyi divergences. These divergences naturally govern acceptance probabilities and warm-start requirements for Metropolis-adjusted Markov chains. In this work, we develop a framework for upgrading Wasserstein convergence guarantees for unadjusted Hamiltonian Monte Carlo (uHMC) to guarantees in tail-sensitive KL and R\'enyi divergences. Our approach is based on one-shot couplings, which we use to establish a regularization property of the uHMC transition kernel. This regularization allows Wasserstein-2 mixing-time and asymptotic bias bounds to be lifted to KL divergence, and analogous Orlicz-Wasserstein bounds to be lifted to R\'enyi divergence, paralleling earlier work of Bou-Rabee and Eberle (2023) that upgrade Wasserstein-1 bounds to total variation distance via kernel smoothing. As a consequence, our results provide quantitative control of relative density mismatch, clarify the role of discretization bias in strong divergences, and yield principled guarantees relevant both for unadjusted sampling and for generating warm starts for Metropolis-adjusted Markov chains.