🤖 AI Summary
Molecular dynamics (MD) simulations face challenges in MM/GBSA binding free energy estimation due to high computational cost and poor generalizability across diverse protein–ligand systems. To address this, we propose SurGBSA—the first method to incorporate physics-informed self-supervised pretraining into MD trajectory representation learning. SurGBSA leverages over 1.4 million 3D trajectory samples and jointly optimizes a deep neural network with MM/GBSA physical constraints during large-scale pretraining, yielding transferable, physics-aware molecular dynamic representations. On pose ranking tasks, SurGBSA achieves near single-point MM/GBSA accuracy—suffering only a marginal −0.4% drop in Spearman correlation—while accelerating inference by 6,497×. This work advances foundational models for molecular dynamics behavior prediction. All code, pretrained models, and trajectory datasets are publicly released.
📝 Abstract
Self-supervised pretraining from static structures of drug-like compounds and proteins enable powerful learned feature representations. Learned features demonstrate state of the art performance on a range of predictive tasks including molecular properties, structure generation, and protein-ligand interactions. The majority of approaches are limited by their use of static structures and it remains an open question, how best to use atomistic molecular dynamics (MD) simulations to develop more generalized models to improve prediction accuracy for novel molecular structures. We present SURrogate mmGBSA (SurGBSA) as a new modeling approach for MD-based representation learning, which learns a surrogate function of the Molecular Mechanics Generalized Born Surface Area (MMGBSA). We show for the first time the benefits of physics-informed pre-training to train a surrogate MMGBSA model on a collection of over 1.4 million 3D trajectories collected from MD simulations of the CASF-2016 benchmark. SurGBSA demonstrates a dramatic 6,497x speedup versus a traditional physics-based single-point MMGBSA calculation while nearly matching single-point MMGBSA accuracy on the challenging pose ranking problem for identification of the correct top pose (-0.4% difference). Our work advances the development of molecular foundation models by showing model improvements when training on MD simulations. Models, code and training data are made publicly available.