๐ค AI Summary
Sampling molecular conformational distributions is essential for predicting physical properties, yet conventional molecular dynamics (MD) and Markov chain Monte Carlo (MCMC) methods are computationally expensive and often fail to satisfy ergodicity requirements. To address this, we propose Potential-guided Score Matching (PSM), a novel framework that explicitly incorporates potential energy gradients into the score estimation of diffusion modelsโwithout requiring exact energy functions or ergodicity assumptions. PSM learns an unbiased Boltzmann distribution directly from limited or biased data, bypassing costly MD simulations and MCMC sampling altogether. This yields substantial gains in both sampling efficiency and physical consistency. On the Lennard-Jones (LJ) model, PSM surpasses state-of-the-art methods; on high-dimensional benchmarks MD17 and MD22, it generates conformational distributions significantly closer to the true Boltzmann distribution, with markedly improved sampling efficiency.
๐ Abstract
The ensemble average of physical properties of molecules is closely related to the distribution of molecular conformations, and sampling such distributions is a fundamental challenge in physics and chemistry. Traditional methods like molecular dynamics (MD) simulations and Markov chain Monte Carlo (MCMC) sampling are commonly used but can be time-consuming and costly. Recently, diffusion models have emerged as efficient alternatives by learning the distribution of training data. Obtaining an unbiased target distribution is still an expensive task, primarily because it requires satisfying ergodicity. To tackle these challenges, we propose Potential Score Matching (PSM), an approach that utilizes the potential energy gradient to guide generative models. PSM does not require exact energy functions and can debias sample distributions even when trained on limited and biased data. Our method outperforms existing state-of-the-art (SOTA) models on the Lennard-Jones (LJ) potential, a commonly used toy model. Furthermore, we extend the evaluation of PSM to high-dimensional problems using the MD17 and MD22 datasets. The results demonstrate that molecular distributions generated by PSM more closely approximate the Boltzmann distribution compared to traditional diffusion models.