Mask prior-guided denoising diffusion improves inverse protein folding

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Inverse protein folding faces challenges in modeling highly uncertain regions—such as loops and intrinsically disordered segments—including low sequence prediction accuracy and poor uncertainty calibration. To address these, we propose the first structure-guided discrete diffusion framework that integrates a masked-prior pre-trained graph neural network with Monte Carlo Dropout, explicitly modeling joint residue-backbone dependencies to enhance uncertainty quantification. Our method conditions sequence generation on the protein backbone and employs a denoising diffusion process for precise residue assignment. Evaluated on four major sequence design benchmarks, it significantly outperforms state-of-the-art methods. Generated sequences faithfully recapitulate native proteins’ physicochemical properties and 3D structural features, while covering diverse folds and protein families—demonstrating strong generalization capability and biological plausibility.

Technology Category

Application Category

📝 Abstract
Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep-learning advances showing significant potential and competitive performance. However, challenges remain in predicting highly uncertain regions, such as those with loops and disorders. To tackle such low-confidence residue prediction, we propose a extbf{Ma}sk extbf{p}rior-guided denoising extbf{Diff}usion ( extbf{MapDiff}) framework that accurately captures both structural and residue interactions for inverse protein folding. MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise, conditioned on a given protein backbone. To incorporate structural and residue interactions, we develop a graph-based denoising network with a mask prior pre-training strategy. Moreover, in the generative process, we combine the denoising diffusion implicit model with Monte-Carlo dropout to improve uncertainty estimation. Evaluation on four challenging sequence design benchmarks shows that MapDiff significantly outperforms state-of-the-art methods. Furthermore, the in-silico sequences generated by MapDiff closely resemble the physico-chemical and structural characteristics of native proteins across different protein families and architectures.
Problem

Research questions and friction points this paper is trying to address.

Improves inverse protein folding accuracy
Predicts low-confidence disordered protein regions
Generates native-like protein sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mask-prior-guided denoising diffusion model
Graph-based denoising network with mask-prior
Combines denoising diffusion with Monte-Carlo dropout
🔎 Similar Papers
No similar papers found.
P
Peizhen Bai
School of Computer Science, University of Sheffield, Sheffield, United Kingdom
F
Filip Miljković
Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
Xianyuan Liu
Xianyuan Liu
University of Sheffield
Deep LearningMaterials DesignMachine Learning
L
Leonardo De Maria
Medicinal Chemistry, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
R
Rebecca Croasdale-Wood
Biologics Engineering, Oncology R&D, AstraZeneca, Cambridge, United Kingdom
O
Owen Rackham
School of Biological Sciences, University of Southampton, Southampton, United Kingdom
Haiping Lu
Haiping Lu
Professor of Machine Learning, University of Sheffield
Machine learningMultimodal AIAI4HealthAI4ScienceOpen-source software