Mask prior-guided denoising diffusion improves inverse protein folding

📅 2024-12-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Inverse protein folding faces challenges in modeling highly uncertain regions—such as loops and intrinsically disordered segments—including low sequence prediction accuracy and poor uncertainty calibration. To address these, we propose the first structure-guided discrete diffusion framework that integrates a masked-prior pre-trained graph neural network with Monte Carlo Dropout, explicitly modeling joint residue-backbone dependencies to enhance uncertainty quantification. Our method conditions sequence generation on the protein backbone and employs a denoising diffusion process for precise residue assignment. Evaluated on four major sequence design benchmarks, it significantly outperforms state-of-the-art methods. Generated sequences faithfully recapitulate native proteins’ physicochemical properties and 3D structural features, while covering diverse folds and protein families—demonstrating strong generalization capability and biological plausibility.

Technology Category

Application Category

📝 Abstract

Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep-learning advances showing significant potential and competitive performance. However, challenges remain in predicting highly uncertain regions, such as those with loops and disorders. To tackle such low-confidence residue prediction, we propose a extbf{Ma}sk extbf{p}rior-guided denoising extbf{Diff}usion ( extbf{MapDiff}) framework that accurately captures both structural and residue interactions for inverse protein folding. MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise, conditioned on a given protein backbone. To incorporate structural and residue interactions, we develop a graph-based denoising network with a mask prior pre-training strategy. Moreover, in the generative process, we combine the denoising diffusion implicit model with Monte-Carlo dropout to improve uncertainty estimation. Evaluation on four challenging sequence design benchmarks shows that MapDiff significantly outperforms state-of-the-art methods. Furthermore, the in-silico sequences generated by MapDiff closely resemble the physico-chemical and structural characteristics of native proteins across different protein families and architectures.

Problem

Research questions and friction points this paper is trying to address.

Improves inverse protein folding accuracy

Predicts low-confidence disordered protein regions

Generates native-like protein sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mask-prior-guided denoising diffusion model

Graph-based denoising network with mask-prior

Combines denoising diffusion with Monte-Carlo dropout

🔎 Similar Papers

AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance