Learning conformational ensembles of proteins based on backbone geometry

📅 2025-02-19

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

196K/year

🤖 AI Summary

Existing protein conformational sampling methods—relying either on evolutionary information or pretrained folding models—suffer from limited applicability, low efficiency, and potential biases. To address these limitations, we propose BBFlow, the first flow-matching generative model that operates exclusively on backbone geometric structure, requiring neither evolutionary sequence information nor pretrained models, and directly learns a conformational ensemble consistent with the Boltzmann distribution from scratch. BBFlow innovatively employs equilibrium backbone geometry both to condition the vector field and to define a learnable SE(3)-equivariant prior distribution, enabling robust modeling of multi-chain proteins and de novo design. Compared to state-of-the-art methods, BBFlow achieves orders-of-magnitude faster training (converging in GPU-days) and significantly accelerated inference, while maintaining competitive performance on both native protein reconstruction and de novo design benchmarks.

Technology Category

Application Category

📝 Abstract

Deep generative models have recently been proposed for sampling protein conformations from the Boltzmann distribution, as an alternative to often prohibitively expensive Molecular Dynamics simulations. However, current state-of-the-art approaches rely on fine-tuning pre-trained folding models and evolutionary sequence information, limiting their applicability and efficiency, and introducing potential biases. In this work, we propose a flow matching model for sampling protein conformations based solely on backbone geometry - BBFlow. We introduce a geometric encoding of the backbone equilibrium structure as input and propose to condition not only the flow but also the prior distribution on the respective equilibrium structure, eliminating the need for evolutionary information. The resulting model is orders of magnitudes faster than current state-of-the-art approaches at comparable accuracy, is transferable to multi-chain proteins, and can be trained from scratch in a few GPU days. In our experiments, we demonstrate that the proposed model achieves competitive performance with reduced inference time, across not only an established benchmark of naturally occurring proteins but also de novo proteins, for which evolutionary information is scarce or absent. BBFlow is available at https://github.com/graeter-group/bbflow.

Problem

Research questions and friction points this paper is trying to address.

Sampling protein conformations without evolutionary sequence information

Replacing expensive Molecular Dynamics simulations for protein analysis

Overcoming limitations of current approaches requiring fine-tuned models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow matching model for protein conformation sampling

Geometric encoding of backbone structure as input

Eliminates evolutionary information dependency

🔎 Similar Papers

AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance