Learning conformational ensembles of proteins based on backbone geometry

📅 2025-02-19
🏛️ arXiv.org
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
Existing protein conformational sampling methods—relying either on evolutionary information or pretrained folding models—suffer from limited applicability, low efficiency, and potential biases. To address these limitations, we propose BBFlow, the first flow-matching generative model that operates exclusively on backbone geometric structure, requiring neither evolutionary sequence information nor pretrained models, and directly learns a conformational ensemble consistent with the Boltzmann distribution from scratch. BBFlow innovatively employs equilibrium backbone geometry both to condition the vector field and to define a learnable SE(3)-equivariant prior distribution, enabling robust modeling of multi-chain proteins and de novo design. Compared to state-of-the-art methods, BBFlow achieves orders-of-magnitude faster training (converging in GPU-days) and significantly accelerated inference, while maintaining competitive performance on both native protein reconstruction and de novo design benchmarks.

Technology Category

Application Category

📝 Abstract
Deep generative models have recently been proposed for sampling protein conformations from the Boltzmann distribution, as an alternative to often prohibitively expensive Molecular Dynamics simulations. However, current state-of-the-art approaches rely on fine-tuning pre-trained folding models and evolutionary sequence information, limiting their applicability and efficiency, and introducing potential biases. In this work, we propose a flow matching model for sampling protein conformations based solely on backbone geometry - BBFlow. We introduce a geometric encoding of the backbone equilibrium structure as input and propose to condition not only the flow but also the prior distribution on the respective equilibrium structure, eliminating the need for evolutionary information. The resulting model is orders of magnitudes faster than current state-of-the-art approaches at comparable accuracy, is transferable to multi-chain proteins, and can be trained from scratch in a few GPU days. In our experiments, we demonstrate that the proposed model achieves competitive performance with reduced inference time, across not only an established benchmark of naturally occurring proteins but also de novo proteins, for which evolutionary information is scarce or absent. BBFlow is available at https://github.com/graeter-group/bbflow.
Problem

Research questions and friction points this paper is trying to address.

Sampling protein conformations without evolutionary sequence information
Replacing expensive Molecular Dynamics simulations for protein analysis
Overcoming limitations of current approaches requiring fine-tuned models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow matching model for protein conformation sampling
Geometric encoding of backbone structure as input
Eliminates evolutionary information dependency
N
Nicolas Wolf
Max Planck Institute for Polymer Research, Mainz, Germany
L
Leif Seute
Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
V
Vsevolod Viliuga
SciLifeLab and DBB at Stockholm University, Stockholm, Sweden
S
Simon Wagner
IWR, Heidelberg University, Heidelberg, Germany
Jan Stühmer
Jan Stühmer
Heidelberg Institute for Theoretical Studies (HITS), Karlsruhe Institute of Technology (KIT)
Computer VisionArtificial IntelligenceMachine Learning
F
Frauke Gräter
Max Planck Institute for Polymer Research, Mainz, Germany