Improving Protein Sequence Design through Designability Preference Optimization

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing protein sequence design methods train primarily on sequence reconstruction objectives, failing to ensure *designability*—i.e., the capacity of generated sequences to fold into target structures. This work introduces Residue-level Designability Preference Optimization (ResiDPO), the first structure-aware preference learning framework for *de novo* protein design. ResiDPO enables decoupled, residue-level structural reward modeling: it preserves high-confidence regions while selectively enhancing designability at structurally weak positions. The method integrates Direct Preference Optimization (DPO) with AlphaFold’s pLDDT as a structural preference signal and is embedded within the LigandMPNN fine-tuning pipeline. On a rigorous enzyme design benchmark, *in silico* design success rises from 6.56% to 17.57%—a near threefold improvement. The derived model, EnhancedMPNN, achieves significantly improved sequence–structure consistency.

Technology Category

Application Category

📝 Abstract

Protein sequence design methods have demonstrated strong performance in sequence generation for de novo protein design. However, as the training objective was sequence recovery, it does not guarantee designability--the likelihood that a designed sequence folds into the desired structure. To bridge this gap, we redefine the training objective by steering sequence generation toward high designability. To do this, we integrate Direct Preference Optimization (DPO), using AlphaFold pLDDT scores as the preference signal, which significantly improves the in silico design success rate. To further refine sequence generation at a finer, residue-level granularity, we introduce Residue-level Designability Preference Optimization (ResiDPO), which applies residue-level structural rewards and decouples optimization across residues. This enables direct improvement in designability while preserving regions that already perform well. Using a curated dataset with residue-level annotations, we fine-tune LigandMPNN with ResiDPO to obtain EnhancedMPNN, which achieves a nearly 3-fold increase in in silico design success rate (from 6.56% to 17.57%) on a challenging enzyme design benchmark.

Problem

Research questions and friction points this paper is trying to address.

Optimize protein sequence design for higher foldability

Enhance designability using AlphaFold pLDDT preference signals

Improve residue-level precision with ResiDPO structural rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrate DPO with AlphaFold pLDDT scores

Introduce ResiDPO for residue-level optimization

Fine-tune LigandMPNN to achieve EnhancedMPNN

🔎 Similar Papers

No similar papers found.