BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Computing the Jacobian determinant in Boltzmann-distributed sampling is computationally expensive and hinders scalability to large molecular systems. Method: We propose an efficient generative framework that avoids explicit Jacobian computation by integrating continuous normalizing flows, stochastic interpolation paths, noise contrastive estimation (NCE), and score matching—guided by the molecular energy function to jointly optimize likelihood and gradient matching of the generative distribution. Contribution/Results: We innovatively incorporate stochastic interpolation and contrastive learning into Boltzmann generators, achieving the first NCE-driven training of flow-based models—bypassing the need for exact likelihood evaluation and Jacobian integration required by conventional normalizing flows. On the alanine dipeptide benchmark, our method reconstructs free-energy landscapes with high fidelity to reference methods, achieving a free-energy difference error < 0.5 kcal/mol and accelerating sampling by two to three orders of magnitude.

Technology Category

Application Category

📝 Abstract
Efficient sampling from the Boltzmann distribution defined by an energy function is a key challenge in modeling physical systems such as molecules. Boltzmann Generators tackle this by leveraging Continuous Normalizing Flows that transform a simple prior into a distribution that can be reweighted to match the Boltzmann distribution using sample likelihoods. However, obtaining likelihoods requires computing costly Jacobians during integration, making it impractical for large molecular systems. To overcome this, we propose learning the likelihood of the generated distribution via an energy-based model trained with noise contrastive estimation and score matching. By using stochastic interpolants to anneal between the prior and generated distributions, we combine both the objective functions to efficiently learn the density function. On the alanine dipeptide system, we demonstrate that our method yields free energy profiles and energy distributions comparable to those obtained with exact likelihoods. Additionally, we show that free energy differences between metastable states can be estimated accurately with orders-of-magnitude speedup.
Problem

Research questions and friction points this paper is trying to address.

Efficient sampling from Boltzmann distribution for molecular systems
Avoiding costly Jacobians in likelihood computation for large systems
Accurate free energy estimation with significant speedup
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses noise contrastive estimation for likelihood learning
Employs stochastic interpolants for distribution annealing
Combines score matching to learn density functions
🔎 Similar Papers
No similar papers found.
Rishal Aggarwal
Rishal Aggarwal
University of Pittsburgh
Computational BiologyDrug DesignMachine Learning
J
Jacky Chen
CMU-Pitt Computational Biology, Dept. of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260
Nicholas M. Boffi
Nicholas M. Boffi
CMU
machine learningapplied mathematicsartificial intelligence
D
David Ryan Koes
Dept. of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260