🤖 AI Summary
This study addresses the challenge of accurately completing missing regions in partial structural templates within X-ray crystallography. We propose a phase-improvement method that synergistically integrates experimental diffraction data with deep learning. Specifically, we introduce Patterson maps—computed from observed structure-factor amplitudes—together with AlphaFold-predicted partial structural templates as joint inputs to CrysFormer, a hybrid 3D architecture combining Vision Transformers and convolutional networks, which directly predicts high-fidelity electron density maps; subsequent standard crystallographic refinement yields atomic models. Our key contribution lies in tightly coupling classical crystallographic principles—including experimental phase information, Patterson analysis, and refinement protocols—with modern AI-driven, end-to-end electron density prediction, thereby enabling an efficient closed-loop workflow from diffraction data to atomic coordinates. Evaluated on a small-protein fragment dataset, our method significantly improves electron density map fidelity relative to ground-truth structures, accurately reconstructs missing residues, and enhances phase accuracy of structure factors.
📝 Abstract
Protein structure determination has long been one of the primary challenges of structural biology, to which deep machine learning (ML)-based approaches have increasingly been applied. However, these ML models generally do not incorporate the experimental measurements directly, such as X-ray crystallographic diffraction data. To this end, we explore an approach that more tightly couples these traditional crystallographic and recent ML-based methods, by training a hybrid 3-d vision transformer and convolutional network on inputs from both domains. We make use of two distinct input constructs / Patterson maps, which are directly obtainable from crystallographic data, and ``partial structure''template maps derived from predicted structures deposited in the AlphaFold Protein Structure Database with subsequently omitted residues. With these, we predict electron density maps that are then post-processed into atomic models through standard crystallographic refinement processes. Introducing an initial dataset of small protein fragments taken from Protein Data Bank entries and placing them in hypothetical crystal settings, we demonstrate that our method is effective at both improving the phases of the crystallographic structure factors and completing the regions missing from partial structure templates, as well as improving the agreement of the electron density maps with the ground truth atomic structures.