🤖 AI Summary
This work addresses the inefficiency of preconditioning matrices in high-dimensional Hamiltonian Monte Carlo (HMC) sampling caused by the curvature of the target density. The authors propose a novel preconditioning approach based on minimizing Fisher divergence, which employs a linear transformation to approximate the target density with a standard normal distribution. Three variants of the preconditioning matrix are constructed: diagonal, dense, and low-rank plus diagonal. Empirical evaluation across 114 models from posteriordb demonstrates that the proposed method substantially outperforms existing strategies in Stan and PyMC. Specifically, the diagonal variant achieves a 1.3× speedup, while the low-rank plus diagonal formulation yields up to a 4× acceleration, significantly enhancing HMC sampling efficiency.
📝 Abstract
Although Hamiltonian Monte Carlo (HMC) scales as O(d^(1/4)) in dimension, there is a large constant factor determined by the curvature of the target density. This constant factor can be reduced in most cases through preconditioning, the state of the art for which uses diagonal or dense penalized maximum likelihood estimation of (co)variance based on a sample of warmup draws. These estimates converge slowly in the diagonal case and scale poorly when expanded to the dense case. We propose a more effective estimator based on minimizing the sample Fisher divergence from a linearly transformed density to a standard normal distribution. We present this estimator in three forms, (a) diagonal, (b) dense, and (c) low-rank plus diagonal. Using a collection of 114 models from posteriordb, we demonstrate that the diagonal minimizer of Fisher divergence outperforms the industry-standard variance-based diagonal estimators used by Stan and PyMC by a median factor of 1.3. The low-rank plus diagonal minimizer of the Fisher divergence outperforms Stan and PyMC's diagonal estimators by a median factor of 4.