Computational Protein Science in the Era of Large Language Models (LLMs)

๐Ÿ“… 2025-01-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional AI models exhibit limitations in protein sequence understanding and cross-task generalization. Method: This study systematically reviews large language model (LLM)-driven computational protein science, introducing a knowledge-dimensional taxonomy for protein language models (pLMs)โ€”categorizing them by sequence patterns, structure-function relationships, and interdisciplinary scientific language. It proposes three paradigmatic advances: multi-task generalization, cross-modal reasoning, and end-to-end design. Methodologically, it integrates Transformer architectures with multimodal knowledge fusion, prompt engineering, transfer learning, and joint structure-function modeling. Contribution/Results: Experiments demonstrate significant improvements in structural prediction accuracy, functional annotation coverage, and de novo protein design success rates. The framework achieves multiple wet-lab validations in antibody/enzyme design and novel drug discovery, establishing pLMs as foundational tools for next-generation computational protein science.

Technology Category

Application Category

๐Ÿ“ Abstract
Considering the significance of proteins, computational protein science has always been a critical scientific field, dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm. In the last few decades, Artificial Intelligence (AI) has made significant impacts in computational protein science, leading to notable successes in specific protein modeling tasks. However, those previous AI models still meet limitations, such as the difficulty in comprehending the semantics of protein sequences, and the inability to generalize across a wide range of protein modeling tasks. Recently, LLMs have emerged as a milestone in AI due to their unprecedented language processing&generalization capability. They can promote comprehensive progress in fields rather than solving individual tasks. As a result, researchers have actively introduced LLM techniques in computational protein science, developing protein Language Models (pLMs) that skillfully grasp the foundational knowledge of proteins and can be effectively generalized to solve a diversity of sequence-structure-function reasoning problems. While witnessing prosperous developments, it's necessary to present a systematic overview of computational protein science empowered by LLM techniques. First, we summarize existing pLMs into categories based on their mastered protein knowledge, i.e., underlying sequence patterns, explicit structural and functional information, and external scientific languages. Second, we introduce the utilization and adaptation of pLMs, highlighting their remarkable achievements in promoting protein structure prediction, protein function prediction, and protein design studies. Then, we describe the practical application of pLMs in antibody design, enzyme design, and drug discovery. Finally, we specifically discuss the promising future directions in this fast-growing field.
Problem

Research questions and friction points this paper is trying to address.

Protein Language Models
Biological Applications
Artificial Intelligence in Biotechnology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Protein Language Models
Predictive Capabilities
Protein Science Advancements
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Wenqi Fan
Department of Computing and Department of Management and Marketing, The Hong Kong Polytechnic University
Y
Yi Zhou
Department of Computing, The Hong Kong Polytechnic University
S
Shijie Wang
Department of Computing, The Hong Kong Polytechnic University
Yuyao Yan
Yuyao Yan
Xi'an Jiaotong-Liverpool University
H
Hui Liu
Michigan State University
Q
Qian Zhao
Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University
Le Song
Le Song
CTO, GenBio AI; Professor, MBZUAI
AIAI for ScienceMachine Learning
Q
Qing Li
Department of Computing, The Hong Kong Polytechnic University