🤖 AI Summary
This work addresses the challenge of designing protein sequences that simultaneously achieve high structural recoverability and favorable developability properties—such as solubility and thermal stability—without relying on extensive manual hyperparameter tuning or task-specific retraining. The authors propose ProtAlign, a framework built upon the ProteinMPNN inverse folding model, which integrates in silico property predictors to construct multi-objective preference pairs. By incorporating semi-online direct preference optimization and flexible preference boundaries, ProtAlign enables efficient fine-tuning that balances competing objectives without adjusting multiple hyperparameters per task. The resulting model, MoMPNN, demonstrates substantial improvements in developability across diverse design scenarios—including CATH 4.3 crystal structures, de novo backbones, and real-world binder designs—while maintaining high structural fidelity.
📝 Abstract
Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, developability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.