🤖 AI Summary
Existing protein conformational sampling methods—relying either on evolutionary information or pretrained folding models—suffer from limited applicability, low efficiency, and potential biases. To address these limitations, we propose BBFlow, the first flow-matching generative model that operates exclusively on backbone geometric structure, requiring neither evolutionary sequence information nor pretrained models, and directly learns a conformational ensemble consistent with the Boltzmann distribution from scratch. BBFlow innovatively employs equilibrium backbone geometry both to condition the vector field and to define a learnable SE(3)-equivariant prior distribution, enabling robust modeling of multi-chain proteins and de novo design. Compared to state-of-the-art methods, BBFlow achieves orders-of-magnitude faster training (converging in GPU-days) and significantly accelerated inference, while maintaining competitive performance on both native protein reconstruction and de novo design benchmarks.
📝 Abstract
Deep generative models have recently been proposed for sampling protein conformations from the Boltzmann distribution, as an alternative to often prohibitively expensive Molecular Dynamics simulations. However, current state-of-the-art approaches rely on fine-tuning pre-trained folding models and evolutionary sequence information, limiting their applicability and efficiency, and introducing potential biases. In this work, we propose a flow matching model for sampling protein conformations based solely on backbone geometry - BBFlow. We introduce a geometric encoding of the backbone equilibrium structure as input and propose to condition not only the flow but also the prior distribution on the respective equilibrium structure, eliminating the need for evolutionary information. The resulting model is orders of magnitudes faster than current state-of-the-art approaches at comparable accuracy, is transferable to multi-chain proteins, and can be trained from scratch in a few GPU days. In our experiments, we demonstrate that the proposed model achieves competitive performance with reduced inference time, across not only an established benchmark of naturally occurring proteins but also de novo proteins, for which evolutionary information is scarce or absent. BBFlow is available at https://github.com/graeter-group/bbflow.