🤖 AI Summary
Current methods for predicting protein conformational ensembles are limited by the scarcity of high-quality data and rely on inefficient two-stage workflows for cryo-EM modeling. This work proposes a novel paradigm—CryoSampler—that fine-tunes the pretrained static structure model Boltz-2 directly on raw cryo-EM density maps during inference, enabling end-to-end generation of atomic conformational ensembles from density maps. This approach represents the first successful adaptation of a pretrained protein structure model to cryo-EM density data, significantly outperforming existing atomic modeling techniques. Moreover, it demonstrates strong in-domain generalization to unseen sequences within the same protein family, capable of sampling diverse conformations even without corresponding density maps.
📝 Abstract
Knowledge of a protein's atomic conformational ensemble is critical to determining its function, yet state-of-the-art ensemble prediction models are limited by lack of high-quality conformational data from simulation or experiment. Recent advances in heterogeneous reconstruction for cryo-electron microscopy (cryo-EM) have enabled scientists to visualize ensembles of density maps for larger proteins and complexes not typically accessible through simulation, but building atomic models into these maps remains a challenge. Traditionally, ensemble prediction models are trained via a two-stage process: experimental density maps are converted into atomic structural ensembles through model building, after which these structures are used to train sequence-to-atomic ensemble predictors. In this work, we propose a new principle for fine-tuning pre-trained static structure prediction models such as Boltz-2 directly on raw cryo-EM maps, bypassing the two-stage process. We apply this technique to the problem of atomic model building by fine-tuning Boltz-2 to generate atomic conformations from an input ensemble of cryo-EM maps, achieving superior model building accuracy compared to prior work. Beyond overfitting to individual map ensembles, our method, CryoSampler, also shows preliminary evidence of in-domain generalization after fine-tuning, sampling diverse atomic conformations for an unseen sequences within the same protein family without requiring cryo-EM data. These capabilities indicate that CryoSampler holds the potential to train next-generation atomic ensemble prediction models directly on raw cryo-EM measurements.