Large Language Model is Secretly a Protein Sequence Optimizer

📅 2025-01-16

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses protein sequence engineering by proposing a zero-shot optimization framework that requires no fine-tuning on biological data. Methodologically, it is the first to uncover the latent protein sequence optimization capability of large language models (LLMs), leveraging task-oriented prompt engineering, multi-objective Pareto frontier search, and iterative sampling with reweighting under experimental budget constraints—enabling efficient directed evolution starting from wild-type sequences. Its key contribution lies in breaking the conventional paradigm reliant on biological-data fine-tuning, instead enabling cross-modal knowledge transfer for de novo protein design. Experiments across multiple synthetic and real-world fitness landscapes demonstrate that the method achieves significantly higher discovery rates of high-functionality sequences using fewer mutations and experimental rounds than state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

We consider the protein sequence engineering problem, which aims to find protein sequences with high fitness levels, starting from a given wild-type sequence. Directed evolution has been a dominating paradigm in this field which has an iterative process to generate variants and select via experimental feedback. We demonstrate large language models (LLMs), despite being trained on massive texts, are secretly protein sequence optimizers. With a directed evolutionary method, LLM can perform protein engineering through Pareto and experiment-budget constrained optimization, demonstrating success on both synthetic and experimental fitness landscapes.

Problem

Research questions and friction points this paper is trying to address.

Protein Sequence Optimization

Functional Enhancement

Bioinformatics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Protein Sequence Optimization

Directed Evolution Simulation

🔎 Similar Papers

No similar papers found.