🤖 AI Summary
X-ray crystallography cannot directly reconstruct electron density maps from diffraction data due to the missing phase problem, and conventional approaches rely on time-consuming manual refinement. This work proposes the first end-to-end, experimentally guided diffusion-based generative framework that formulates structure refinement as a Bayesian inference problem. By leveraging a pretrained protein structure prior, the method performs conditional posterior sampling in structure factor amplitude space, simultaneously optimizing atomic coordinates and B-factors without explicit phase retrieval. Evaluated across multiple protein datasets, the approach significantly outperforms existing methods, achieving lower coordinate RMSD and R-factors while accelerating computation by a factor of 33.
📝 Abstract
Generative models trained on public databases of protein structures, most of which have been determined by X-ray crystallography, now provide powerful priors for structure prediction. However, they are not readily conditioned on the measurements from a new crystallographic experiment, limiting their use for X-ray structure determination. In crystallography, the measured structure-factor amplitudes do not by themselves determine an electron density map or atomic structure because the associated phases are unobserved and must be inferred. Structure determination therefore remains an inverse problem in which candidate models must be both structurally plausible and consistent with measured diffraction data, often requiring substantial manual refinement by human experts. Emerging methods aim to incorporate experimental information more directly into predictive and refinement workflows. We present CrystalBoltz, a generative framework that casts crystallographic refinement as Bayesian inference over atomic structures and operates directly on structure-factor amplitudes. CrystalBoltz moves from unguided generation with a pre-trained prior over protein structures to experiment-guided posterior sampling, followed by atomic coordinate and B-factor refinement. Across multiple protein crystallography datasets, CrystalBoltz attains lower coordinate RMSD and lower R-factors than the strongest baselines considered, while reducing runtime by a factor of 33 relative to existing experimentally guided refinement.