Protein Inverse Folding From Structure Feedback

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the protein inverse folding problem—designing amino acid sequences that optimally adopt a given target 3D structure. We propose a novel closed-loop optimization framework, introducing Direct Preference Optimization (DPO) to this task for the first time. The framework iteratively executes four stages: sequence sampling, structural evaluation via RoseTTAFold/AlphaFold2 variants to generate structure-prediction labels, preference feedback derived from structural similarity (e.g., TM-Score), and end-to-end sequence optimization guided by DPO. On the CATH 4.2 benchmark, our method achieves a mean TM-Score of 0.81—outperforming baselines by +0.04. For hard-to-fold targets, multi-iteration refinement yields a relative TM-Score improvement of 79.5%, demonstrating substantially enhanced modeling capability for complex conformations. This represents the first application of preference-based learning to inverse folding and establishes a principled, feedback-driven paradigm for structure-aware sequence design.

Technology Category

Application Category

📝 Abstract

The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference Optimization (DPO) to fine-tune an inverse folding model using feedback from a protein folding model. Given a target protein structure, we begin by sampling candidate sequences from the inverse-folding model, then predict the three-dimensional structure of each sequence with the folding model to generate pairwise structural-preference labels. These labels are used to fine-tune the inverse-folding model under the DPO objective. Our results on the CATH 4.2 test set demonstrate that DPO fine-tuning not only improves sequence recovery of baseline models but also leads to a significant improvement in average TM-Score from 0.77 to 0.81, indicating enhanced structure similarity. Furthermore, iterative application of our DPO-based method on challenging protein structures yields substantial gains, with an average TM-Score increase of 79.5% with regard to the baseline model. This work establishes a promising direction for enhancing protein sequence design ability from structure feedback by effectively utilizing preference optimization.

Problem

Research questions and friction points this paper is trying to address.

Design amino acid sequences for desired 3D protein structures

Optimize inverse folding using feedback from folding models

Improve sequence recovery and structure similarity metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

DPO fine-tunes inverse-folding model

Uses folding model feedback

Iterative method improves TM-Score

🔎 Similar Papers

AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance