🤖 AI Summary
Drug–protein unbinding kinetics prediction has long been hindered by data scarcity and model bias. To address this, we introduce a dissociation-pathway–oriented enhanced sampling strategy, generating 26,612 high-quality unbinding trajectories (13 million frames) and establishing DD-13M—the first open-source, large-scale unbinding trajectory dataset. Leveraging DD-13M, we design UnbindingFlow, the first equivariant generative model for unbinding path prediction, capable of efficiently generating collision-free, geometrically valid dissociation pathways while accurately reproducing experimentally observed kinetic trends. Our three key contributions are: (1) the first dissociation-directed enhanced sampling protocol; (2) the first multi-target, multi-ligand unbinding trajectory benchmark dataset (DD-13M); and (3) the first equivariant generative framework specifically tailored for unbinding process modeling, significantly improving pathway validity and kinetic prediction fidelity.
📝 Abstract
Drug-protein binding and dissociation dynamics are fundamental to understanding molecular interactions in biological systems. While many tools for drug-protein interaction studies have emerged, especially artificial intelligence (AI)-based generative models, predictive tools on binding/dissociation kinetics and dynamics are still limited. We propose a novel research paradigm that combines molecular dynamics (MD) simulations, enhanced sampling, and AI generative models to address this issue. We propose an enhanced sampling strategy to efficiently implement the drug-protein dissociation process in MD simulations and estimate the free energy surface (FES). We constructed a program pipeline of MD simulations based on this sampling strategy, thus generating a dataset including 26,612 drug-protein dissociation trajectories containing about 13 million frames. We named this dissociation dynamics dataset DD-13M and used it to train a deep equivariant generative model UnbindingFlow, which can generate collision-free dissociation trajectories. The DD-13M database and UnbindingFlow model represent a significant advancement in computational structural biology, and we anticipate its broad applicability in machine learning studies of drug-protein interactions. Our ongoing efforts focus on expanding this methodology to encompass a broader spectrum of drug-protein complexes and exploring novel applications in pathway prediction.