Refold: Refining Protein Inverse Folding with Efficient Structural Matching and Fusion

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the limitations of existing approaches in protein inverse folding: template-based methods suffer from insufficient coverage of structural databases and degrade on out-of-distribution targets, while deep learning models, despite their generalization capacity, struggle to accurately capture local structural details, leading to high uncertainty in residue-level predictions. To overcome these challenges, the authors propose a dynamic fusion framework that integrates structural priors with deep learning. The method employs efficient structural matching to retrieve template information and introduces a dynamic utility gating mechanism that adaptively modulates the injection of prior knowledge—enhancing predictions when the prior is reliable and reverting to the base model otherwise. Evaluated on CATH 4.2 and 4.3 benchmarks, the approach achieves a state-of-the-art native sequence recovery rate of 63% and significantly improves design accuracy in high-uncertainty regions.

Technology Category

Application Category

📝 Abstract

Protein inverse folding aims to design an amino acid sequence that will fold into a given backbone structure, serving as a central task in protein design. Two main paradigms have been widely explored. Template-based methods exploit database-derived structural priors and can achieve high local precision when close structural neighbors are available, but their dependence on database coverage and match quality often degrades performance on out-of-distribution (OOD) targets. Deep learning approaches, in contrast, learn general structure-to-sequence regularities and usually generalize better to new backbones. However, they struggle to capture fine-grained local structure, which can cause uncertain residue predictions and missed local motifs in ambiguous regions. We introduce Refold, a novel framework that synergistically integrates the strengths of database-derived structural priors and deep learning prediction to enhance inverse folding. Refold obtains structural priors from matched neighbors and fuses them with model predictions to refine residue probabilities. In practice, low-quality neighbors can introduce noise, potentially degrading model performance. We address this issue with a Dynamic Utility Gate that controls prior injection and falls back to the base prediction when the priors are untrustworthy. Comprehensive evaluations on standard benchmarks demonstrate that Refold achieves state-of-the-art native sequence recovery of 0.63 on both CATH 4.2 and CATH 4.3. Also, analysis indicates that Refold delivers larger gains on high-uncertainty regions, reflecting the complementarity between structural priors and deep learning predictions.

Problem

Research questions and friction points this paper is trying to address.

protein inverse folding

structural priors

deep learning

out-of-distribution

local structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

protein inverse folding

structural priors

deep learning